basic anova and glm

Upload: itmmec

Post on 03-Jun-2018

236 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Basic Anova and Glm

    1/40

    Basic Analysis of Variance and

    the General Linear Model

    Psy 420

    Andrew Ainsworth

  • 8/12/2019 Basic Anova and Glm

    2/40

    Why is it called analysis of variance

    anyway?

    If we are interested in group mean differences, why

    are we looking at variance?

    t-test only one place to look for variability

    More groups, more places to look

    Variance of group means around a central tendency

    (grand meanignoring group membership) really

    tells us, on average, how much each group isdifferent from the central tendency (and each other)

  • 8/12/2019 Basic Anova and Glm

    3/40

    Why is it called analysis of

    variance anyway?

    Average mean variability around GM needs to

    be compared to average variability of scores

    around each group mean Variability in any distribution can be broken

    down into conceptual parts:

    total variability = (variability of each group

    mean around the grand mean) + (variability of

    each persons score around their group mean)

  • 8/12/2019 Basic Anova and Glm

    4/40

    General Linear Model (GLM)

    The basis for most inferential statistics

    (e.g. 420, 520, 524, etc.)

    Simple form of the GLM

    score=grand mean + independent variable + error

    Y

  • 8/12/2019 Basic Anova and Glm

    5/40

    General Linear Model (GLM)

    The basic idea is that everyone in the

    population has the same score (the grand

    mean) that is changed by the effects of anindependent variable (A) plus just random

    noise (error)

    Some levels of A raise scores from theGM, other levels lower scores from the

    GM and yet others have no effect.

  • 8/12/2019 Basic Anova and Glm

    6/40

    General Linear Model (GLM)

    Error is the noise caused by other variables you arentmeasuring, havent controlled for or are unaware of.

    Error like A will have different effects on scores but thishappens independently of A.

    If error gets too large it will mask the effects of A and make itimpossible to analyze the effects of A

    Most of the effort in research designs is done to try andminimize error to make sure the effect of A is not buried in

    the noise. The error term is important because it gives us a yard stick

    with which to measure the variability cause by the A effect.We want to make sure that the variability attributable to A isgreater than the naturally occurring variability (error)

  • 8/12/2019 Basic Anova and Glm

    7/40

    GLM

    Example of GLMANOVA backwards

    We can generate a data set using the GLM formula

    We start off with every subject at the GM (e.g. =5)

    a1 a2

    Case Score Case Score

    s1

    s2s3

    s4

    s5

    5

    55

    5

    5

    s6

    s7s8

    s9

    s10

    5

    55

    5

    5

  • 8/12/2019 Basic Anova and Glm

    8/40

    GLM Then we add in the effect of A (a1 adds 2 points and

    a2 subtracts 2 points)

    a1 a2

    Case Score Case Scores1

    s2

    s3

    s4

    s5

    5 + 2 = 7

    5 + 2 = 7

    5 + 2 = 7

    5 + 2 = 7

    5 + 2 = 7

    s6

    s7

    s8

    s9

    s10

    52 = 3

    52 = 3

    52 = 3

    52 = 3

    5

    2 = 3

    135aY 2 15aY

    1

    2 245aY 22 45aY

    1 7aY 3 3aY

  • 8/12/2019 Basic Anova and Glm

    9/40

    GLM

    Changes produced by the treatment represent

    deviations around the GM

    2 2 2

    2 2 2 2

    ( ) [(7 5) (3 5) ]

    5(2) 5( 2) 5[(2) ( 2) ] 40

    jn Y GM n

    or

  • 8/12/2019 Basic Anova and Glm

    10/40

    GLM

    Now if we add in some random variation (error)

    a1 a2

    Case Score Case Score SUM

    s1

    s2

    s3

    s4

    s5

    5 + 2 + 2 = 9

    5 + 2 + 0 = 7

    5 + 21 = 6

    5 + 2 + 0 = 7

    5 + 2

    1 = 6

    s6

    s7

    s8

    s9

    s10

    5

    2 + 0 = 3

    522 = 1

    52 + 0 = 3

    52 + 1 = 4

    5

    2 + 1 = 4

    135aY 2 15aY 50Y

    1

    2 251aY 22 51aY 2 302Y

    1

    7a

    Y 3

    3a

    Y 5Y

  • 8/12/2019 Basic Anova and Glm

    11/40

    GLM

    Now if we calculate the variance for each group:

    The average variance in this case is also going tobe 1.5 (1.5 + 1.5 / 2)

    2

    2 22

    21

    ( ) 1551

    5 1.51 4

    a

    N

    YY

    NsN

    1

    2 22

    2 1

    ( ) 35251

    5 1.51 4

    a

    N

    YY

    NsN

  • 8/12/2019 Basic Anova and Glm

    12/40

    GLM

    We can also calculate the total variability inthe data regardless of treatment group

    The average variability of the two groups issmaller than the total variability.

    2 22

    2

    1

    ( ) 50302

    10 5.781 9

    N

    YY

    NsN

  • 8/12/2019 Basic Anova and Glm

    13/40

    Analysisdeviation approach

    The total variability can be partitioned

    into between group variability and

    error.

    ij ij j jY GM Y Y Y GM

  • 8/12/2019 Basic Anova and Glm

    14/40

    Analysisdeviation approach

    If you ignore group membership and

    calculate the mean of all subjects this

    is the grand mean and total variabilityis the deviation of all subjects around

    this grand mean

    Remember that if you just looked atdeviations it would most likely sum to

    zero so

  • 8/12/2019 Basic Anova and Glm

    15/40

    Analysisdeviation approach

    2 2 2

    /

    ij j ij j

    i j j i j

    total bg wg

    total A S A

    Y GM n Y GM Y Y

    SS SS SS

    SS SS SS

  • 8/12/2019 Basic Anova and Glm

    16/40

    Analysisdeviation approachA Score 2ijY GM 2jY GM 2ij jY Y

    a1

    9

    7

    6

    7

    6

    16

    4

    1

    4

    1

    (75)2= 4

    4

    0

    1

    0

    1

    a2

    3

    1

    3

    4

    4

    4

    16

    4

    1

    1

    (35)2= 4

    0

    4

    0

    1

    1

    50Y 2 302Y 52 8 12

    5Y 5(8) 40n

    52 = 40 + 12

  • 8/12/2019 Basic Anova and Glm

    17/40

    Analysisdeviation approach

    degrees of freedom

    DFtotal = N1 = 10 -1 = 9

    DFA= a1 = 21 = 1

    DFS/A= a(S1) = a(n1) = ana =

    Na = 2(5)2 = 8

  • 8/12/2019 Basic Anova and Glm

    18/40

    Analysisdeviation approach

    Variance or Mean square

    MStotal= 52/9 = 5.78

    MSA= 40/1 = 40

    MSS/A= 12/8 = 1.5

    Test statistic

    F = MSA/MSS/A= 40/1.5 = 26.67

    Critical value is looked up with dfA, dfS/A

    and alpha. The test is always non-

    directional.

  • 8/12/2019 Basic Anova and Glm

    19/40

    Analysisdeviation approach

    ANOVA summary table

    Source SS df MS F

    A 40 1 40 26.67

    S/A 12 8 1.5Total 52 9

  • 8/12/2019 Basic Anova and Glm

    20/40

    Analysiscomputational

    approach Equations

    Under each part of the equations, you divide by the

    number of scores it took to get the number in the

    numerator

    2

    22 2

    Y T

    Y TSS SS Y Y

    N an

    2 2jA

    a TSS

    n an

    2

    2

    /

    j

    S AaSS Y

    n

  • 8/12/2019 Basic Anova and Glm

    21/40

    Analysiscomputational

    approach

    Analysis of sample problem

    250

    302 5210TSS

    2 2 235 15 5040

    5 10

    ASS

    2 2

    /

    35 15302 12

    5S ASS

  • 8/12/2019 Basic Anova and Glm

    22/40

    Analysisregression

    approach

    Levels of A Cases Y X YX

    a1

    S1S2S3

    S4S5

    9

    7

    6

    76

    1

    1

    1

    11

    9

    7

    6

    76

    a2

    S6S7S8

    S9S10

    3

    1

    3

    44

    -1

    -1

    -1

    -1-1

    -3

    -1

    -3

    -4-4

    Sum 50 0 20

    Squares Summed 302 10

    N 10

    Mean 5

  • 8/12/2019 Basic Anova and Glm

    23/40

    Analysisregression

    approach

    Y = a + bX + e

    e = YY

  • 8/12/2019 Basic Anova and Glm

    24/40

    Analysisregression

    approach

    Sums of squares

    22

    2 50( ) 302 5210

    YSS Y Y

    N

    22

    2 0( ) 10 1010

    XSS X X

    N

    ( )( ) (50)(0)

    ( ) 20 2010

    Y XSP YX YX

    N

  • 8/12/2019 Basic Anova and Glm

    25/40

    Analysisregression

    approach

    ( ) ( ) 52TotalSS SS Y

    2 2( ) 20

    40( ) 10regression

    SP YXSS

    SS X

    ( ) ( ) ( ) 52 40 12residual total regressionSS SS SS

  • 8/12/2019 Basic Anova and Glm

    26/40

    Slope

    Intercept

    2

    2

    ( )( )

    ( ) 202

    ( ) 10

    Y X

    YX SP YXNbSS XX

    XN

    5 2(0) 5a Y bX

  • 8/12/2019 Basic Anova and Glm

    27/40

  • 8/12/2019 Basic Anova and Glm

    28/40

    Analysisregression

    approach

    Degrees of freedom

    df(reg.)= # of predictors

    df(total)= number of cases1

    df(resid.)= df(total)df(reg.) =

    91 = 8

  • 8/12/2019 Basic Anova and Glm

    29/40

    Statistical Inference

    and the F-test

    Any type of measurement will include acertain amount of random variability.

    In the F-test this random variability is seenin two places, random variation of eachperson around their group mean and eachgroup mean around the grand mean.

    The effect of the IV is seen as adding furthervariation of the group means around theirgrand mean so that the F-test is really:

  • 8/12/2019 Basic Anova and Glm

    30/40

    Statistical Inference

    and the F-test

    If there is no effect of the IV than theequation breaks down to just:

    which means that any differences between the

    groups is due to chance alone.

    BG

    WG

    effect errorF

    error

    1

    BG

    WG

    error

    F error

  • 8/12/2019 Basic Anova and Glm

    31/40

    Statistical Inference

    and the F-test

    The F-distribution is based on havinga between groups variation due to theeffect that causes the F-ratio to be

    larger than 1. Like the t-distribution, there is not a

    single F-distribution, but a family of

    distributions. The F distribution isdetermined by both the degrees offreedom due to the effect and thedegrees of freedom due to the error.

  • 8/12/2019 Basic Anova and Glm

    32/40

    Statistical Inference

    and the F-test

  • 8/12/2019 Basic Anova and Glm

    33/40

    Assumptions of the analysis

    Robusta robust test is one that is

    said to be fairly accurate even if the

    assumptions of the analysis are notmet. ANOVA is said to be a fairly

    robust analysis. With that said

  • 8/12/2019 Basic Anova and Glm

    34/40

    Assumptions of the analysis

    Normality of the sampling distribution of

    means

    This assumes that the sampling distribution ofeach level of the IV is relatively normal.

    The assumption is of the sampling distribution

    not the scores themselves

    This assumption is said to be met when there

    is relatively equal samples in each cell and the

    degrees of freedom for error is 20 or more.

  • 8/12/2019 Basic Anova and Glm

    35/40

    Assumptions of the analysis

    Normality of the sampling distributionof means If the degrees of freedom for error are small

    than:

    The individual distributions should bechecked for skewness and kurtosis (see

    chapter 2) and the presence of outliers. If the data does not meet the distributional

    assumption than transformations will need tobe done.

  • 8/12/2019 Basic Anova and Glm

    36/40

    Assumptions of the analysis

    Independence of errorsthe size of the error forone case is not related to the size of the error inanother case.

    This is violated if a subject is used more than once(repeated measures case) and is still analyzed withbetween subjects ANOVA

    This is also violated if subjects are ran in groups. This

    is especially the case if the groups are pre-existing This can also be the case if similar people exist within

    a randomized experiment (e.g. age groups) and canbe controlled by using this variable as a blockingvariable.

  • 8/12/2019 Basic Anova and Glm

    37/40

    Assumptions of the analysis

    Homogeneity of Variancesince we are

    assuming that each sample comes from

    the same population and is only affected(or not) by the IV, we assume that each

    groups has roughly the same variance

    Each sample variance should reflect the

    population variance, they should be equal toeach other

    Since we use each sample variance to estimate

    an average within cell variance, they need to be

    roughly equal

  • 8/12/2019 Basic Anova and Glm

    38/40

    Assumptions of the analysis

    Homogeneity of Variance

    An easy test to assess this assumption is:

    2

    largest

    max 2

    smallest

    max 10, than the variances are roughly homogenous

    SF

    S

    F

  • 8/12/2019 Basic Anova and Glm

    39/40

    Assumptions of the analysis

    Absence of outliers

    Outliersa data point that doesnt really

    belong with the others Either conceptually, you wanted to study

    only women and you have data from a man

    Or statistically, a data point does not cluster

    with other data points and has undueinfluence on the distribution

    This relates back to normality

  • 8/12/2019 Basic Anova and Glm

    40/40

    Assumptions of the analysis

    Absence of outliers