4. hlm

Upload: rohit-sethi

Post on 10-Oct-2015

9 views

Category:

Documents


0 download

DESCRIPTION

Analytics

TRANSCRIPT

  • 5/20/2018 4. HLM

    1/32

    Doing HLM using SAS PROCMIXED

    Kazwww.estat.us

    Feb 2005

    http://www.estat.us/http://www.estat.us/
  • 5/20/2018 4. HLM

    2/32

    My points

    (1) Easy to compare HLM and other modelsthat are not HLM; thus, helpful. This isbecause PROC MIXED lets you run models

    that are not HLM.(2) Easy to understand what makes HLM

    HLM. In SAS, what is not essential to

    HLM is done outside PROC MIXED (e.g.,centering)

  • 5/20/2018 4. HLM

    3/32

    OLS vs. HLM in PROC MIXED.

    The difference is a RANDOM statement.

    OLS regression syntaxPROC MIXED ;MODEL Y= X;

    Run; HLM syntax

    PROC MIXED;Model Y=X;random intercept X/ subject=school;

    Run;

    (1)

    OLS

    Y_jk = b0 + error_jk

    (2)

    HLMLevel1: Y_jk=b0 + error_jkLevel2: b0=g0 + error_korY_jk=b0 + error_jk + error_k

  • 5/20/2018 4. HLM

    4/32

    Again, turning a simple linearmodel into HLM

    (1)PROC MIXED;Model Y=X W;

    Run;(2)PROC MIXED;

    Model Y=X W;random intercept X W/subject=GroupID;Run;

    (3)Random statement below reads:I request that the intercept, as well

    as the effects of X and W beEstimated for each subject whichcan be identified by GroupID.

  • 5/20/2018 4. HLM

    5/32

    How to write SAS PROC MIXED syntax:

    Intuitive way(1) Write all the variable names at the model

    statement.

    model Y=X W;

    (2) Decide which variableseffect you want toestimate by schools

    random intercept X W/subject=school;

  • 5/20/2018 4. HLM

    6/32

    More careful way1. Start from level-specific specification.e.g., level1:y=b0 + b1*X + error_ij

    level2: b0=g00 + g01*W + error_0jlevel2: b1=g10 + g11*W + error_1j

    2. Insert level-2 equations into level-1 equations.3. Write the variable names involved in model statement.

    4. Find random components(written in Roman alphabets)RULE1: Put interceptin the random statement to accommodatehigher level errors.RULE2: If the name of any variables sits right next to level-2error with an asterisk (e.g., X*level-2 error), put those variablenames in the random statement.(RULE3:No worry about residual. It is set by default.)

  • 5/20/2018 4. HLM

    7/32

    Example 1 Anova Model

    Level1: Y_ij=b0j + Residual_ij

    Level2: b0j= g00 + U_0j

    Y= g00 +U_0j + Residual

    proc mixed

    ;class group;model Y= ;

    random intercept/subject=school;run;

    I said:

    RULE1: Put interceptin the random statement toaccommodate higher level errors.RULE2: If the name of any variables sits right next tolevel-2 error with an asterisk (e.g., X*level-2 error), putthose variable names in the random statement.ONLY RULE1 relevant in this model.

  • 5/20/2018 4. HLM

    8/32

    Example 2 Slope as outcome models

    Level1: Y_ij=b0j + b1j*X + Residual_ij

    Level2: b0j= g00 + g01*W + U_0jLevel2: b1j=g10+ g11*W +U_1j

    Y= g00 + g01*W + g10*X + g11*W*X+ U_1j*X +U_0j + Residual

    proc mixed

    ;class group;model Y= W X W*X;

    random intercept X/subject=school;run;

    What were..RULE1?RULE2?

  • 5/20/2018 4. HLM

    9/32

    How to do substitution:Cheating using HLM software!

    PUSHMIXED buttonto get a littlewindow like this.

  • 5/20/2018 4. HLM

    10/32

    How to do substitution by hand

    Level1: Y_ij=b0j + b1j*X + Residual_ijLevel2: b0j= g00 + t g01*W + U_0j

    Level2: b1j=gt10+ g11*W +U_1j

    1. Insert higher level equations into the level-1 equation.

    Y=[g00 + g01*W + U_0j] + [g10+g11*W + U_1j]*X + Residual_ij

    2. Take out the brackets--> Y=g00 + g01*W + U_0j + g10*X +g11*W*X + U_1j*X + Residual_ij

    3. Notice which parts are structural part and which parts are random components. Y=g00 + g01*W + g10*X +g11*W*X + U_1j*X + U_0j + Residual_ij

    proc mixed ;model W X W*X;random intercept X /subject=school;run;

    What were rule1 and rule 2?

  • 5/20/2018 4. HLM

    11/32

    Fixed Effects or Random EffectsOLS regression is a fixed effect model

    PROC MIXED;Model Y=X;Run;

    OLS regression is a model with fixed effects. So in a way OLSis a special case of HLM. This is an awfully inflexible modelthat does not consider the existence of various sources oferrors.

    HLMPROC MIXED;Model Y=X;random intercept X/ subject=groupID;Run;

    If a researcher thinks the effect of X (and the intercept) isdifferent by groups, so we should treat these coefficientsas random effects.

  • 5/20/2018 4. HLM

    12/32

    Benefit 1 of using random effect

    Conceptual oneUseful to think about Micro-Macro problems

    (1)Student: Math score=b0 + b1*parentseducation level +

    . + errorCountry:b1=g00 + g01*SELECTION + error

    (2)Classroom: teacher perception of math ability of class=b0 + b1*average parentseducation level

    +b2*average math score+b3*noise + error

    Country:b1=g10 + g11*National Exam + errorb2=g20 + g21*National Exam + errorb3=g30 + g31*National Exam + error

  • 5/20/2018 4. HLM

    13/32

    Benefit 2: Statistical benefit

    Statistical Benefits In deriving a grand mean (re: the effect of X or an intercept) HLM does

    shrinkage. This pulls inaccurate group means towards the grand mean, so wecan reduce the influence of outliners if their estimates are inaccurate (i.e.,having large error variance and/or coming from a small number of observationswithin each group unit)

    Shrunk School mean=reliability*school meanwhere reliability is a function of N of observation in a group unit and variance.(R&B HLM book, p. 48)

    Quiz: 1) what happens to a school whose reliability is 1?2) What happens if all schools are 1 on reliability?3) What happens if all schools are .5 on reliability?

  • 5/20/2018 4. HLM

    14/32

    Quick decision rule

    Random or fix

    Do I open the door at 11PM?

    Literature Theory

    Exploratory analysis (lets see what

    happens.)

  • 5/20/2018 4. HLM

    15/32

    Moore complicated: Two step decisionsregarding random effects

    (I need your help in phrasing this.)

    Step 1: Effect different by school?

    Step 2: Random or Fixed? Fixed: Use a series of dummy variables

    (in reality too tedious)

    Random: Shrinkage applies and get aprecision guided grand mean

  • 5/20/2018 4. HLM

    16/32

    ExampleStudent Engagement study

    using ESM

    by Uekawa, Borman, and Lee

  • 5/20/2018 4. HLM

    17/32

    Engagement Level (Rasch

    model composite) When you were signaled the first time today, SD D A SA I was paying attention.. O O O O I did not feel like listening O O O O My motivation level was high.. O O O O I was bored.. O O O O I was enjoying class O O O O I was focused more on class than anything else O O O O I wished the class would end soon. O O O O I was completely into class O O O O

    The MEANS Procedure

    Analysis Variable : engagement engagement

    N Mean Std Dev Minimum Maximum 2316 0.1167889 10.0106694 -31.5283511 26.4164991

  • 5/20/2018 4. HLM

    18/32

    3-level HLM Level1: Repeated Measures (10 beeps)

    Level2: Students (10 kids from a class)

    Level3: courses (34 courses, Monday to Friday)

  • 5/20/2018 4. HLM

    19/32

    3-level HLMLibname here "C:\";/*This is three level model*/procmixeddata=here.esm covtest noclprint;

    class IDclass IDstudent;model engagement= /solution ddfm=kr;random intercept /sub=IDstudent(IDclass);random intercept /sub=IDclass ;run;

    Quiz: how can we make this a 2-level hlm?

  • 5/20/2018 4. HLM

    20/32

    PROC MIXED statement

    proc mixed data=here.esm covtest noclprint; covtestdoes a test for covariance components (whether variances are

    significantly larger than zero.). The reason why you have to request such asimple thing is that COVTEST is not based on chi-square test that onewould use for a test of variance. It uses instead t-test or something thatis not really appropriate. Shockingly, SAS has not corrected this problem

    for a while. Anyways, because SAS feels bad about it, it does not want tomake it into a default option, which is why you have to request this. Notmany people know this and I myself could not believe this. So I guess thatmeans that we cannot really believe in the result of COVTEST and mustuse it with caution.

    When there are lots of group units, use NOCLPRINT to suppress theprinting of group names.

  • 5/20/2018 4. HLM

    21/32

    CLASS statement

    class IDclass IDstudent Hisp; We throw in the variables that we want SAS to treat as categorical

    variables. Variables that are characters (e.g., city names) must be on thisline (it wont run otherwise). Group IDs, such as IDclass in my exampledata, must be also in these lines; otherwise, it wont run. Variables thatare numeric but dummy-coded (e.g., black=1 if black;else 0) dont have to bein this line, but the outputs will look easier if you do.

    One thing that is a pain in the neck with CLASS statement is that itchooses a reference category by alphabetical order. Whatever group in aclassification variable that comes the last when alphabetically ordered willbe used as a reference group. We can control this by data manipulation.For example, if gender=BOY or GIRL, then I tend to create a new variableto make it explicit that I get girl to be a reference group:If gender=Boythen gender2=(1) Boy;

    If gender=Girlthen gender2=(2) Girl;

  • 5/20/2018 4. HLM

    22/32

    MODEL statementmodel engagement= /solution ddfm=kr;

    ddfm=krspecifics the ways in which the degree offreedom is calculated. It seems most close to thedegree of freedom option used by Bryk, Raudenbush,and Congdons HLM program.

    Could be computationally very heavy if a model is

    complicated. ddfm=bwwould run faster, though DFwould be wrong.

  • 5/20/2018 4. HLM

    23/32

    Random statementrandom intercept X/sub=IDstudent(IDclass);random intercept X/sub=IDclass ;

    We can estimate variance of slopes for categorical variablesusing group=option --- without necessarily making theminto dummy variables.

    random intercept race /sub=IDclass group=race;

    (instead of random intercept black white hispanic/sub=IDclass;)

    Lib h "G \SAS"MODEL 1

  • 5/20/2018 4. HLM

    24/32

    Libname here "G:\SAS";procmixeddata=here.esm covtest noclprint;weight precision_weight;class IDclass IDstudent;model engagement= /solution ddfm=kr;random intercept /sub=IDstudent(IDclass);random intercept /sub=IDclass ;run;

    The Mixed Procedure

    Covariance Parameter Estimates

    Standard Z Cov Parm Subject Estimate Error Value Pr Z

    Intercept IDstudent(IDclass) 23.3556 2.5061 9.32

  • 5/20/2018 4. HLM

    25/32

    Libname here "G:\SAS";procmixeddata=here.esm covtest noclprint;weight precision_weight;class IDclass IDstudent subject;model engagement= hisp /solution ddfm=kr;random intercept /sub=IDstudent(IDclass);random intercept hisp /sub=IDclass ;

    run;

    Solution for Fixed Effects

    Standard

    Effect Estimate Error DF t Value Pr > |t|

    Intercept -0.6287 0.5101 33.1 -1.23 0.2265

    hisp -2.2113 1.0031 17.7 -2.20 0.0410

    Covariance Parameter Estimates

    Standard Z

    Cov Parm Subject Estimate Error Value Pr Z

    Intercept IDstudent(IDclass) 22.4743 2.4712 9.09

  • 5/20/2018 4. HLM

    26/32

    MODEL 3

    procmixeddata=here.esm covtest noclprint;

    weight precision_weight;

    class IDclass IDstudent subject;

    model engagement= hisp math hisp*math /solution ddfm=kr;

    random intercept /sub=IDstudent(IDclass);

    random intercept hisp /sub=IDclass ;run;

    Solution for Fixed Effects

    Standard

    Effect Estimate Error DF t Value Pr > |t|

    Intercept -0.3249 0.7061 34.2 -0.46 0.6484

    hisp -4.1236 1.4562 14.5 -2.83 0.0129

    math -0.6081 1.0108 33.7 -0.60 0.5515

    hisp*math 3.3305 1.9233 15.2 1.73 0.1035

    The Mixed Procedure

    Covariance Parameter Estimates

    Standard Z

    Cov Parm Subject Estimate Error Value Pr Z

    Intercept IDstudent(IDclass) 22.6987 2.5020 9.07

  • 5/20/2018 4. HLM

    27/32

    MODEL 3Solution for Fixed Effects

    Standard

    Effect Estimate Error DF t Value Pr > |t|

    Intercept -0.3249 0.7061 34.2 -0.46 0.6484

    hisp -4.1236 1.4562 14.5 -2.83 0.0129

    math -0.6081 1.0108 33.7 -0.60 0.5515

    hisp*math 3.3305 1.9233 15.2 1.73 0.1035The Mixed Procedure

    Covariance Parameter Estimates

    Standard Z

    Cov Parm Subject Estimate Error Value Pr Z

    Intercept IDstudent(IDclass) 22.6987 2.5020 9.07 C

    Residual 31.9761 1.0269 31.14

  • 5/20/2018 4. HLM

    28/32

    Which is easy to understand?

    In HLM software In SAS PROC MIXEDLevel-1 Intercept DisappearsLevel-2 Intercept DisappearsLevel-3 Intercept InterceptLevel-1 Error ResidualLevel-2 Error Random effectsLevel-3 Error Random effects

    HLM way

    Level1:engagement=b0+ b1*Hispanic +residual

    Level2: b0=g00 + A

    Level2: b1=g10

    Level3: g00=t_000 + t_100*Math + B

    Level3: g10=t_100 + t_101*Math + C

    PROC MIXED wayLevel1:engagement=t_000+ t_100*Math

    + t_100*Hispanic+ t_101*Math*Hispanic

    + C*Hispanic+ B + A+ residual

  • 5/20/2018 4. HLM

    29/32

    Why do we center variables?Level1:engagement=

    t_000+ t_100*Math+ t_100*Hispanic+ t_101*Math*Hispanic

    + C*Hispanic+ B + A+ residual Imagine we have to report to teachers

    their studentsaverage engagement score.

    We want to use B + t_000. To be clearaboutMeaning of t_000 part, we couldcentervariables,if it makes sense.

    h

  • 5/20/2018 4. HLM

    30/32

    What about Centering?

    In SAS, we use PROC STANDARD to do centering and this is outsideof PROC MIXED. When I learned this, I thought, I have done itbefore!because centering is similar to the concept of Z-scores.

    This is GROUP MEAN CenteringProc standard data=X mean=0;by GroupID;var X;Run;

    This is GRAND MEAN Centeringproc standard data=X mean=0;var X;Run;

    By the way, just for your information, this is to create Z-scoresproc standard data=X mean=0 STD=1;

    var X;Run;

    When you useSAS PROCMIXED, younotice Centering

    is not really atopic that isspecific toHLMbecauseit is doneoutside PROC

    MIXED.

  • 5/20/2018 4. HLM

    31/32

    What does it mean to centerdummy variable, like gender?

    1.To adjust for gender composition.

    2.-Without it, the intercept = either male or

    female

    -With it, the intercept is adjusted for gendercomposition.

    3. See my Excel Presentation if we havetime. www.estat.us/sas/centering.xls

  • 5/20/2018 4. HLM

    32/32

    ENDTo go back to my HLM pagewww.estat.us/id38.html

    http://www.estat.us/id38.htmlhttp://www.estat.us/id38.html