two sample and anova handout

Upload: jennybabe-peta

Post on 03-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Two Sample and ANOVA Handout

    1/7

    Comparing Two Means

    Often we have two unknown means and areinterested in comparing them to each other. Usually

    the null hypothesis is

    H0: no difference between the population means

    There are a number of related testing procedures we

    will present. Which testing procedure you choosedepends on your data.

    We will present three basic procedures here:

    Paired t-test for paired or matched dataTwo-sample t-tests for comparing two

    independent groups. Two basic independent-

    sample tests will be presented:

    Equal variance t-tests: the two groups can beassumed to have equal variances.

    Unequal variance t-tests: the two groups arenot assumed to have equal variances.

    Paired t-test

    When the means being compared come fromobservations that are naturally paired or matched, a

    paired t-test is used.

    Examples: Before vs after studies, also called

    longitudinal studies produce paired data. Each

    patient contributes two paired observations: the

    before value and the after value.

    Other types of studies can produce paired data also.

    One possibility would be a dental study where both

    opposing treatments are used in each patient, inrandomly assigned half-mouths.

    Computing a Paired t-test

    To compute a paired t-test, focus on the within-pair

    differences (for example after before).

    Perform a t-test on the mean of the differences. To

    test if the means are different the null-hypothesis is

    H0: differences = 0.

    Note: Even though we are comparing two means,

    this is still considered a one-sample test.

    Example: fluoride varnish study

    In ten at-risk children, fluoride varnish is applied inrandomly assigned half-mouths. The remaining half-mouths are left untreated. The children are followed for

    two years and the new dmfs and locations are recorded:

    patient varnish untreated difference

    1 2 3 -1

    2 1 2 -1

    3 0 1 -1

    4 2 0 2

    5 0 0 0

    6 0 2 -2

    7 2 5 -3

    8 1 1 0

    9 3 7 -4

    10 5 4 1

    mean 1.6 2.5 -0.90

    sd 1.79

    To perform the paired t-test, compute a one-sample t-test

    on the last column where H0: = 0.

    59.11079.1

    090.=

    =T

    For a two-tailed test compare |-1.59|=1.59 to t9, .975 = 2.262.

    We do not reject since 1.59 < 2.262. P-value is

    P(|t9| > |-1.59|) = 2P(t9 > 1.59) = 0.15.

    Comparing means of two independent samples

    These are called two-sample tests.

    Our goal is usually to estimate 1 - 2 and the

    corresponding confidence intervals and to perform

    hypothesis tests on:H0: 1 - 2 = 0.

    For each sample we compute the relevant statistics:

    Sample 1 Sample 2n1 n2

    1X 2X

    s1 s2

    The obvious statistic to compare the two population

    means is 21 XX .

    Probability theory tells us that:

    1. 21 XX is the best estimate of 1 - 22.the standard error is 222121 nn + 3. for large n1and n2:

    ( )2221212121 ,~ nnNXX +

  • 7/28/2019 Two Sample and ANOVA Handout

    2/7

    In order to compute hypothesis tests and confidence

    intervals for 1 - 2 we will need to estimate the

    standard error of 21 XX .

    Two different estimation procedures are commonlyused depending on whether one feels it is reasonableto assume the two groups have similar variances.

    RULES OF THUMB for deciding whether to use

    the equal variance or unequal variance formulas

    1.For small samples can use equal varianceformulas unlesss1 is twice as big ass2, or the

    other way around.

    2. If n1 and n2 > 80 can use unequal varianceformulae for SE (its easier to compute), and use

    the Normal distribution.

    3. If you are unsure, the unequal variance formulawill be the conservative choice (less power, but

    less likely to be incorrect).

    4.The calculations are a snap with a computerprogram. If unsure about variance assumptions,

    compute the test both ways and see if there is a

    conflict.

    Equal Variance case: 1 = 2

    If it is reasonable to assume that 1=2, we canestimate the standard error more efficiently by

    combining the sample.

    Standard Error of 21 XX is estimated by

    2121 11)( nnsXXSE pooled += ,

    where the pooled standard deviation,spooled is

    2

    )1()1(

    21

    2

    22

    2

    11

    +

    +=

    nn

    snsnspooled

    .

    This pooled standard deviation is roughly the

    combined distance of observations from their

    respective means.

    The Tstatistic

    )( 21

    21

    XXSE

    XXTequal

    =

    ,

    has a tdistribution with n1+ n2-2 degrees of freedom.

    Example: Confidence Intervals for difference

    between means. Gum data from day 1.

    Gum A Gum Cn1=25 n2=40

    1X =-0.72 2X =2.63s1=5.37 s2=3.80

    Assume equal variances (s2 /s1< 2)

    46.424025

    80.33937.52422

    =+

    +=pooleds ,

    14.140125146.4)( 21 =+ XXSE ,

    so 95% confidence interval is

    ( )07.1,63.51.142.002.63-0.72- =

    Note: Since confidence interval does not cover 0,this implies that a two-sided hypothesis test of

    H0: 1 - 2 = 0, would reject at level =0.05.

    check: T = |(-0.72 - 2.63)/1.14| = 2.94 > 2.00 = t63,.975.

    t63,.975

    SPSS output for Gum example:

    T-Test

    Group Statistics

    gumtype N Mean Std. Deviation

    Std. ErrorMean

    A25 -0.7200 5.36594 1.07319

    change inDMFS

    C40 2.6250 3.80073 0.60095

    Independent Samples Test

    Levene's Test

    for Equalityof Variances t-test for Equality of Means

    F Sig. t dfSig. (2-tailed)

    MeanDifference

    Std. ErrorDifference

    Equalvariancesassumed 0.924 0.340 -2.940 63 0.005 -3.345 1.138

    change inDMFS

    Equalvariancesnotassumed

    -2.720 39.05 0.010 -3.345 1.230

    95% Confidence Interval of theDifference

    Lower Upper

    -5.61840 -1.07160

    -5.83279 -0.85721

  • 7/28/2019 Two Sample and ANOVA Handout

    3/7

    Unequal Variance case: 1 2

    If one is not sure that the variances are equal it is

    usually safest to assume that they are not.

    Standard Error of 21 XX is estimated by

    2

    2

    21

    2

    121 )( nsnsXXSE += .

    The Tstatistic

    )( 21

    21

    XXSE

    XXTunequal

    =

    ,

    has a tdistribution with degrees of freedom that can

    be estimated by:

    ( )

    1

    )(

    1

    )(

    2

    2

    2

    2

    2

    1

    2

    1

    2

    1

    2

    2

    2

    21

    2

    1

    +

    +

    n

    ns

    n

    ns

    nsns

    Note: If n1 and n2 > 80, then can use standard

    Normal distribution in place of t, whichremoves necessity to estimate degrees of

    freedom.

    Example:NHANES III data

    807 participants who got both dental exam and answeredchewing tobacco question. SPSS t-test output below.

    Group Statistics

    currently chewtobacco N Mean

    Std.Deviation

    Std.M

    yes 341 1.71 1.724 mean attachment

    loss no 466 1.50 1.381

    Independent Samples Test

    Levene's Test

    for Equality

    of Variances t-test for Equality of Means

    F Sig. t df

    Sig.

    (2-tailed)

    Mean

    Difference

    Std. Er

    Differe

    Equal

    variances

    assumed5.682 0.02 1.980 805 0.048 0.22 0.

    mean

    attachment

    loss

    Equal

    variances

    not

    assumed

    1.914 632.3 0.056 0.22 0.

    95% Confidence Interval of

    the Difference

    Lower Upper

    0.002 0.431

    -0.006 0.439

    What to do?

    In this case choose the unequal-variances

    results. They rely less on assumptions, pthe sample sizes are large enough so that

    SEM estimates are probably close to optim

    even if the variances are equal.

  • 7/28/2019 Two Sample and ANOVA Handout

    4/7

    6/8/20

    ANOVA - Analysis of Variance ANOVA - Analysis of Variance

    Extends independent-samples t test Compares the means of groups of

    independent observations

    Dont be fooled by the name. ANOVA does not

    compare variances.

    Can compare more than two groups

    ANOVA Null and Alternative Hypotheses

    Say the sample contains Kindependent groups

    ANOVA tests the null hypothesis

    H0: 1 = 2 = = K

    That is, the group means are all equal

    The alternative hypothesis is

    H1: i j for some i, j

    or, the group means are notall equal

    Example:

    Accuracy of Implant

    Placement

    Implants were placed in a

    manikin using placement

    guides of various widths.

    15 implants were placed

    using each guide.

    Error (discrepancies with a

    reference implant) was

    measured for each implant. 0.2

    3

    0.2

    4

    0.2

    5

    0.2

    6

    0.2

    7

    Mean Error by Guide Width

    Guide Width

    MeanImplantHeightError(mm)

    4mm 6mm 8mm

    Example:

    Accuracy of Implant

    Placement

    The overall mean of the

    entire sample was 0.248

    mm.

    This is called the grandmean, and is often

    denoted by .

    If H0 were true then wed

    expect the group means to

    be close to the grand

    mean.

    0.2

    3

    0.2

    4

    0

    .25

    0.2

    6

    0.2

    7

    Mean Error by Guide Width

    Guide Width

    MeanImplantH

    eightError(mm)

    4mm 6mm 8mm

    X

    Example:

    Accuracy of Implant

    Placement

    The ANOVA test is based

    on the combined distances

    from .

    If the combined distances

    are large, that indicates we

    should reject H0.

    0.2

    3

    0.2

    4

    0

    .25

    0.2

    6

    0.2

    7

    Mean Error by Guide Width

    Guide Width

    MeanImplantH

    eightError(mm)

    4mm 6mm 8mm

    X

  • 7/28/2019 Two Sample and ANOVA Handout

    5/7

    6/8/20

    The Anova Statistic

    To combine the differences from the grand mean we

    Square the differences

    Multiply by the numbers of observations in the groups

    Sum over the groups

    where the are the group means.

    SSB = Sum ofSquares Between groups

    Note: This looks a bit like a variance.

    ( ) ( ) ( )28

    2

    6

    2

    4151515 XXXXXX

    mmmmmm++=SSB

    *X

    How big is big?

    For the Implant Accuracy Data, SSB = 0.0047

    Is that big enough to reject H0?

    As with the ttest, we compare the statistic to the

    variability of the individual observations.

    In ANOVA the variability is estimated by the Mean

    Square Error, or MSE

    0.1

    0.2

    0.3

    0.4

    0.5

    Implant Height Error by Guide Width

    Guide Width

    ImplantHeightError(mm)

    4mm 6mm 8mm

    MSE

    Mean Square Error

    The Mean Square Error is a

    measure of the variability

    after the group effects

    have been taken into

    account.

    wherexij is the ith

    observation in thejth

    group.

    ( )

    =j i

    jijXx

    KNMSE

    21

    MSE

    Mean Square Error

    The Mean Square Error is a

    measure of the variability

    after the group effects

    have been taken into

    account.

    Note that the variation of

    the means seems quite

    small compared to the

    variance of observations

    within groups

    ( )

    =j i

    jijXx

    KNMSE

    21

    0.1

    0.2

    0.3

    0.4

    0.5

    Implant Height Error by Guide Width

    Guide Width

    ImplantHeightError(mm)

    4mm 6mm 8mm

    Notes on MSE

    If there are only two groups, the MSEis equal to the

    pooled estimate of variance used in the equal-

    variance ttest.

    ANOVA assumes that all the group variances are

    equal.

    ANOVA F-statistic

    The ANOVA is based on the Fstatistic

    where Kis the number of groups.

    Under H0 the Fstatistic has an F distribution, with

    K-1 and N-K degrees of freedom (N is the total

    number of observations)

    MSE

    KSSBF

    )1( =

  • 7/28/2019 Two Sample and ANOVA Handout

    6/7

    6/8/20

    Implant Data:

    p-value

    To get a p-value we

    compare our Fstatistic to

    an F(2, 42) distribution.

    F(2,42) distribution

    F

    0 1 2 3 4

    Implant Data:

    p-value

    To get a p-value we

    compare our Fstatistic to

    an F(2, 42) distribution.

    In our example

    The p-value is

    211.420467.

    20047.==F

    ( ) 81.0211(2,42) => .FP

    F(2,42) distribution

    F

    0 1 2 3 40.211

    P = .81

    ANOVA Table

    Sum of

    Squares df

    Mean

    Square F Sig.

    Between Groups .005 2 .002 .211 .811

    Within Groups .466 42 .011

    Total .470 44

    Results are often displayed using an ANOVA Table

    ANOVA Table

    Sum of

    Squares df

    Mean

    Square F Sig.

    Between Groups .005 2 .002 .211 .811

    Within Groups .466 42 .011

    Total .470 44

    Results are often displayed using an ANOVA Table

    Sum of SquaresBetween (SSB)

    Mean SquareError (MSE)

    F Statistic p value

    Post Hoc Tests

    Sum of

    Squares df

    Mean

    Square F Sig.

    Between

    Groups33383 3 11128 5.1 .002

    Within

    Groups4417119 2007 2201

    Total 4450502 2010

    NHANES I data, women

    40-60 yrs old. Compare

    cholesterol between

    periodontal groups.

    The ANOVA shows good

    evidence (p = 0.002) that

    the means are not all the

    same.

    Which means are different?

    Can directly compare the

    subgroups using post hoc

    tests.

    Least Significant Difference test

    Sum of

    Squares df

    Mean

    Square F Sig.

    Between

    Groups33383 3 11128 5.1 .002

    Within

    Groups4417119 2007 2201

    Total 4450502 2010

    The most simple post hoc

    test is called the Least

    Significant Difference Test.

    The computation is verysimilar to the equal-

    variance ttest.

    Compute an equal-variance

    ttest, but replace the

    pooled variance (s2) with

    the MSE.

    N Mean

    Std.

    Deviation

    Healthy 802 221.5 46. 2

    Gingivitis 490 223.5 45.3

    Periodontitis 347 227.3 48.9

    Edentulous 372 232.4 48. 8

  • 7/28/2019 Two Sample and ANOVA Handout

    7/7

    6/8/20

    Least Significant Difference Test: Examples

    Sum of

    Squares df

    Mean

    Square F Sig.

    Between

    Groups33383 3 11128 5.1 .002

    Within

    Groups4417119 2007 2201

    Total4450502 2010

    Compare Healthy group to

    Periodontitis group:

    Compare Gingivitis group to

    Periodontitis group:

    N Mean

    Std.

    Deviation

    Healthy 802 221.5 46. 2

    Gingivitis 490 223.5 45.3

    Periodontitis 347 227.3 48.9

    Edentulous 372 232.4 48. 8

    ( ) 92.13471802122013.2275.221

    =+

    =T

    055.0)92.1(2 1147 =>= tPp

    ( )15.1

    347149012201

    3.2275.223=

    +

    =T

    25.0)15.1(2835

    =>= tPp

    Post Hoc Tests: Multiple Comparisons

    Post-hoc testing usually involves multiple

    comparisons.

    For example, if the data contain 4 groups, then 6

    different pairwise comparisons can be made

    Healthy Gingivitis

    Periodontitis Edentulous

    Post Hoc Tests: Multiple Comparisons

    Each time a hypothesis test is performed at

    significance level , there is probability of rejecting

    in error.

    Performing multiple tests increases the chances of

    rejecting in error at least once.

    For example:

    if you did 6 independent hypothesis tests at the = 0.05

    If, in truth, H0 were true for all six.

    The probability that at least one test rejects H0 is 26%

    P(at least one rejection) = 1-P(no rejections) = 1-.956 = .26

    Bonferroni Correction for Multiple Comparisons

    The Bonferroni correction is a simple way to adjust

    for the multiple comparisons.

    BonferroniCorrection

    Perform each test at significance level .

    Multiply each p-value by the number of tests

    performed.

    The overall significance level (chance of any of the

    tests rejecting in error) will be less than .

    Example: Cholesterol Data post-hoc comparisons

    Group 1 Group 2

    Mean

    Difference

    (Group 1 -

    Group 2)

    Least

    Significant

    Difference

    p-value

    Bonferroni

    p-value

    Healthy Gingivitis -2.0 .46 1.0

    Healthy Pe riodontitis -5.8 .055 .330

    Healthy Edentulous -10.9 .00021 .00126

    Gingivi tis Per iodont it is -3.9 .25 1.0

    Gingivi ti s Edentu lous -8.9 .0056 .0336

    Periodontitis Edentulous -5.1 .147 .88

    Example: Cholesterol Data post-hoc comparisons

    Conclusion: The Edentulous group is significantly different than

    the Healthy group and the Gingivitis group (p < 0.05), after

    adjustment for multiple comparisons

    Group 1 Group 2

    Mean

    Difference

    (Group 1 -

    Group 2)

    Least

    Significant

    Difference

    p-value

    Bonferroni

    p-value

    Healthy Gingivitis -2.0 .46 1.0

    Healthy Periodontitis -5.8 .055 .330

    Healthy Edentulous -10.9 .00021 .00126

    Gingivi tis Per iodont it is -3.9 .25 1.0

    Gingivi ti s Edentu lous -8.9 .0056 .0336

    Periodontitis Edentulous -5.1 .147 .88