py1pr1 stats lecture 4 handout

Upload: christian-miranda

Post on 05-Jul-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    1/29

    PY1PR1 lecture 4: Comparingtwo sample means

    Dr David Field

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    2/29

    Comparing two samples

    • Researchers often begin with a h pothesis thattwo sample means will be different from eachother

    • !n practice" two sample means will almost alwa s

    be slightl different from each other • #herefore" statistics are used to decide whether

    the observed difference between two samples ismeaningful or not

    • #o do this" we test the null h pothesis that the twosamples were both drawn randoml from thesame population

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    3/29

    #est statistics• #o test the null h pothesis we need to $uantif the strength

    of the evidence against it• #his is done using test statistics

    % when the test statistic is larger" there is more evidence against thenull h pothesis

    • &hat ma'es test statistics different from other statistics isthat the have 'nown probabilit distributions when the nullh pothesis is true % we 'now the p of a test statistic of 1 or (1 occurring purel due to

    sampling variation from a null distribution % the p of a test statistic of ) or ( ) will be lower than the p of a test

    statistic of (1 % if the p of the test statistic occurring purel due to sampling

    variation is * +,+- .-/0 the null h pothesis is re ected• #est statistics with 'nown probabilit distributions under the

    null h pothesis include 2" t" r, and chi3s$uare % ean" edian" 5D are not test statistics

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    4/29

    Confidence intervals as a test

    • 6ecture ) e7plained how to calculate a 8-/confidence interval around a single sample mean % this was achieved using the 59 of an inferred sampling

    distribution of the mean % collecting two samples and calculating two separate

    confidence intervals establishes that the two samples arefrom different populations if the confidence intervals do notoverlap

    % but it does not allow a conclusion to be reached when the

    confidence intervals do overlap• #o calculate a test statistic to directl test the nullh pothesis we need to consider a slightl differentsampling distribution

    % the sampling distribution of the difference between twomeans

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    5/29

    5ampling distribution of the differencebetween two means

    • ormall " ou are onl able to measure ) samplesand calculate ) means and the difference betweenthem

    • ;ut test statistics are based on properties of an

    assumed underl ing sampling distribution of thedifference between two means

    • #he best wa to understand test statistics is toconsider unusual or artificial e7amples where fullpopulation data and sampling distributions areavailable

    • #herefore

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    6/29

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    7/29

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    8/29

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    9/29

    &eights of ;ritish cats.Ag0

    &eights of Bree' cats.Ag0

    ean 5D 59 ean 5D 595mall samplesi2e . -0 4,1 +,- +,)= =,8 +,1 +,+>

    4,8 +,> +,)8 4,4 +,- +,)1

    -,) 1,1 +,48 =,? +,= +,1=

    6argesample si2e. 1)0

    4,1 +,@ +,)= 4,1 +,) +,+?

    4,> +,> +,1@ =,8 +,4 +,1+

    4,? +,- +,1- =,@ +,= +,1+

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    10/29

    5ampling distribution of the differencebetween two means

    • #a'e a large number of samples of - cats fromthe A population % Errange the samples in pairs and for each pair

    calculate the difference between the two means

    % alf the differences will be negative and half of themwill be positive

    % #herefore the mean of this sampling distribution willbe 2ero, #his differs from the sampling distribution of

    a single sample mean" which has a mean e$ual tothe underl ing population mean % #he sampling distribution of the difference between

    two means will be normall distributed

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    11/29

    • GR!B! E6 D!5#R!; #!G is the population fre$uenc distribution ofweight differences between pairs of individual cats

    • ;lac' solid curves are sampling distributions of weight differencesbetween ) sample means" for samples of of 4" 1>" and >4 cats

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    12/29

    5tandard error of the difference between twosample means

    • H .sigma0 means the 5D of the population of differencescores

    • 1 and ) are the two sample si2es % the formula allows the 59 of the sampling distribution to be

    calculated when the two samples differ in si2e

    • 6i'e the 59 of a single sample mean" this 59 gets smalleras increases and gets smaller as the 5D gets smaller

    • 5maller 59 ma'es it easier to re ect null h pothesis

    59 1

    1H I 1

    )

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    13/29

    59 of the difference between mean Ag for twosamples of - A cats

    • 1J- .or 1J)" or 1J=" or 1J)+0 is a number less than 1• #he s$uare root ma'es the number larger" but

    never ma'es it greater than 1• 5o" the population 5D gets multiplied b a number

    smaller than 1" which is wh the 59 is alwa s

    smaller than the 5D of the population

    +,-+> Ag 1

    -+,@ I 1

    -

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    14/29

    &eights of ;ritish cats.Ag0

    ean 5D 59

    5mall samplesi2e . -0

    4,1 +,- +,)=

    4,8 +,> +,)8

    -,) 1,1 +,48=,> +,? +,=+

    • For the highlightedpair of samples thedifference betweenthe means is +,-Ag

    • &hat percentage ofsample pairs have adifference of +,-Agor largerK

    • !f we e7pressed thedifference of +,-Agin units of 59 wecould answer that$uestion

    • #his is because theconverted score is aL score

    Remember that in this theoreticale7ample we 'now that bothsamples are from the samepopulation" and the purpose is tocalculate the p of a difference thisbig or bigger occurring when that is

    the case

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    15/29

    Converting the difference between ) samplemeans to a L score

    L

    1-

    +,@ I 1-

    +,-

    L +,88

    #he differencebetween the means

    #he 59formula

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    16/29

    1>,1/ of the total area underthe normal curve corresponds tovalues of +,88 or greater

    1>,1/ of differences betweenmeans of sample si2e - willhave L scores greater than +,88

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    17/29

    From L bac' to Ag

    • 5o" 1>,1/ of differences between pairs ofsamples of - drawn from the population of Acats will be +,-Ag or larger

    • #his is the same as sa ing the probabilit of a

    single comparison producing a difference of +,-Agor greater is 1>,1/

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    18/29

    5D1 )

    &hat if the population 5D .H0 is un'nownK

    • suall " researchers onl have two samples tocompare" and the population parameters areun'nown,

    • !n this situation the sample 5D is used instead ofthe population 5D" and the 59 formula is modified

    591 I

    5D) )

    )

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    19/29

    &eights of ;ritish cats.Ag0

    ean 5D 59

    5mall samplesi2e . -0

    4,1 +,- +,)=

    4,8 +,> +,)8

    -,) 1,1 +,48=,> +,? +,=+

    • For the highlightedpair of samples themean difference is+,-Ag

    • #he sample 5DMs willbe used in themodified formulainstead of the

    un'nown population5D

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    20/29

    +,- )

    Converting the difference between ) meansto a L score when H is un'nown

    -I +,?

    )

    -

    L+,-

    1,)8+,-

    +,=@

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    21/29

    ow much evidence is there against the nullh pothesisK

    • 8,@/ of L statistics are ( 1,)8" so we would not concludethat the two samples of cats are from different countries ifwe used the -/ cut off

    • !n this e7ample" we 'now that the two samples were fromthe same population" so we can verif that this was thecorrect conclusion

    • Gn the other hand" if two samples had a mean differenceof +,@Ag" then assuming the sample 5DMs remain thesame" the resulting L statistic would be ),+?

    • Gnl 1,8/ of L statistics are greater than ),+?" and if wedidnMtknow that the two samples came from the samepopulation we would re ect the null h pothesis" and bdoing so commit a # pe ! error

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    22/29

    &eights of ;ritish cats.Ag0

    &eights of Bree' cats.Ag0

    ean 5D 59 ean 5D 595mall samplesi2e . -0

    4.1 0.5 0.23 =,8 +,1 +,+>

    4,8 +,> +,)8 4,4 +,- +,)1

    -,) 1,1 +,48 3.7 0.3 0.13

    6argesample si2e. 1)0

    4,1 +,@ +,)= 4.1 0.2 0.07

    4.6 0.6 0.18 =,8 +,4 +,1+

    4,? +,- +,1- =,@ +,= +,1+

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    23/29

    +,- )

    #he L score of the difference betweensamples of - A and - Bree' cats

    -I +,=

    )

    -

    L4,1 % =,?

    1,-=+,4

    +,)>

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    24/29

    ow much evidence is there against the nullh pothesisK

    •>,=/ of L statistics are ( 1,-=" so we would be unable toconclude that the two samples of cats are from differentcountries if we used the -/ cut off

    • !n this e7ample we 'now that the two samples were fromdifferent populations" so we have committed a # pe !! errorb failing to re ect the null h pothesis

    • # pe !! errors li'e this are common when the sample si2eis small

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    25/29

    +,> )

    #he L score of the difference betweensamples of 1) A and 1) Bree' cats

    1)I +,)

    )

    1)

    L4,> % 4,1

    ),?=+,-

    +,1@

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    26/29

    ow much evidence is there against the nullh pothesisK

    •+,+=)/ of L statistics are ( ),?=" so we would beconclude that the two samples of cats are from differentcountries if we used the -/ cut off

    • !n this e7ample we 'now that the two samples were fromdifferent populations" so we have correctl re ected the nullh pothesis

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    27/29

    !mportant caveat• &hat ! have described toda is called a NL testO• ;ut" the formula for estimating the 59 of the difference

    between ) means used in the L test is onl accurate whenthe individual sample si2es are =+ or more % #his is because the estimate of the population 5D is not accurate

    • #here is a different test that uses an accurate estimate ofthe 59 when sample si2e is less than =+ % the Nt testO" which is covered in the ne7t lecture

    • ;ecause the t test produces the same results as the L test

    when the sample si2e is (=+ computer programs li'e5P55 generall onl give the option of a t test

    • ;oth tests wor' on the same principle" but the L test is lesscomplicated and easier to understand

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    28/29

    Beneral principle of test statistics

    test statisticvariation in the D due to the !

    other variation in the data .error0

    • Ell test statistics have 'nown probabilit distributions whenvariation in the D due to the ! is 2ero .i,e, the null h p istrue0

    • L has the distribution of the standard normal distribution• Gther test statistics have different shaped distributions"

    and different calculation formulas" but the general principlefor converting the test statistic to a p value is the same,

  • 8/15/2019 PY1PR1 Stats Lecture 4 Handout

    29/29

    6ist of statistical terms for revision

    • #his lecture made use of terms introduced inprevious lectures" and onl introduced one newterm % sampling distribution of the difference between two

    means