py1pr1 stats lecture 4 handout
TRANSCRIPT
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
1/29
PY1PR1 lecture 4: Comparingtwo sample means
Dr David Field
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
2/29
Comparing two samples
• Researchers often begin with a h pothesis thattwo sample means will be different from eachother
• !n practice" two sample means will almost alwa s
be slightl different from each other • #herefore" statistics are used to decide whether
the observed difference between two samples ismeaningful or not
• #o do this" we test the null h pothesis that the twosamples were both drawn randoml from thesame population
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
3/29
#est statistics• #o test the null h pothesis we need to $uantif the strength
of the evidence against it• #his is done using test statistics
% when the test statistic is larger" there is more evidence against thenull h pothesis
• &hat ma'es test statistics different from other statistics isthat the have 'nown probabilit distributions when the nullh pothesis is true % we 'now the p of a test statistic of 1 or (1 occurring purel due to
sampling variation from a null distribution % the p of a test statistic of ) or ( ) will be lower than the p of a test
statistic of (1 % if the p of the test statistic occurring purel due to sampling
variation is * +,+- .-/0 the null h pothesis is re ected• #est statistics with 'nown probabilit distributions under the
null h pothesis include 2" t" r, and chi3s$uare % ean" edian" 5D are not test statistics
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
4/29
Confidence intervals as a test
• 6ecture ) e7plained how to calculate a 8-/confidence interval around a single sample mean % this was achieved using the 59 of an inferred sampling
distribution of the mean % collecting two samples and calculating two separate
confidence intervals establishes that the two samples arefrom different populations if the confidence intervals do notoverlap
% but it does not allow a conclusion to be reached when the
confidence intervals do overlap• #o calculate a test statistic to directl test the nullh pothesis we need to consider a slightl differentsampling distribution
% the sampling distribution of the difference between twomeans
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
5/29
5ampling distribution of the differencebetween two means
• ormall " ou are onl able to measure ) samplesand calculate ) means and the difference betweenthem
• ;ut test statistics are based on properties of an
assumed underl ing sampling distribution of thedifference between two means
• #he best wa to understand test statistics is toconsider unusual or artificial e7amples where fullpopulation data and sampling distributions areavailable
• #herefore
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
6/29
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
7/29
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
8/29
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
9/29
&eights of ;ritish cats.Ag0
&eights of Bree' cats.Ag0
ean 5D 59 ean 5D 595mall samplesi2e . -0 4,1 +,- +,)= =,8 +,1 +,+>
4,8 +,> +,)8 4,4 +,- +,)1
-,) 1,1 +,48 =,? +,= +,1=
6argesample si2e. 1)0
4,1 +,@ +,)= 4,1 +,) +,+?
4,> +,> +,1@ =,8 +,4 +,1+
4,? +,- +,1- =,@ +,= +,1+
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
10/29
5ampling distribution of the differencebetween two means
• #a'e a large number of samples of - cats fromthe A population % Errange the samples in pairs and for each pair
calculate the difference between the two means
% alf the differences will be negative and half of themwill be positive
% #herefore the mean of this sampling distribution willbe 2ero, #his differs from the sampling distribution of
a single sample mean" which has a mean e$ual tothe underl ing population mean % #he sampling distribution of the difference between
two means will be normall distributed
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
11/29
• GR!B! E6 D!5#R!; #!G is the population fre$uenc distribution ofweight differences between pairs of individual cats
• ;lac' solid curves are sampling distributions of weight differencesbetween ) sample means" for samples of of 4" 1>" and >4 cats
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
12/29
5tandard error of the difference between twosample means
• H .sigma0 means the 5D of the population of differencescores
• 1 and ) are the two sample si2es % the formula allows the 59 of the sampling distribution to be
calculated when the two samples differ in si2e
• 6i'e the 59 of a single sample mean" this 59 gets smalleras increases and gets smaller as the 5D gets smaller
• 5maller 59 ma'es it easier to re ect null h pothesis
59 1
1H I 1
)
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
13/29
59 of the difference between mean Ag for twosamples of - A cats
• 1J- .or 1J)" or 1J=" or 1J)+0 is a number less than 1• #he s$uare root ma'es the number larger" but
never ma'es it greater than 1• 5o" the population 5D gets multiplied b a number
smaller than 1" which is wh the 59 is alwa s
smaller than the 5D of the population
+,-+> Ag 1
-+,@ I 1
-
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
14/29
&eights of ;ritish cats.Ag0
ean 5D 59
5mall samplesi2e . -0
4,1 +,- +,)=
4,8 +,> +,)8
-,) 1,1 +,48=,> +,? +,=+
• For the highlightedpair of samples thedifference betweenthe means is +,-Ag
• &hat percentage ofsample pairs have adifference of +,-Agor largerK
• !f we e7pressed thedifference of +,-Agin units of 59 wecould answer that$uestion
• #his is because theconverted score is aL score
Remember that in this theoreticale7ample we 'now that bothsamples are from the samepopulation" and the purpose is tocalculate the p of a difference thisbig or bigger occurring when that is
the case
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
15/29
Converting the difference between ) samplemeans to a L score
L
1-
+,@ I 1-
+,-
L +,88
#he differencebetween the means
#he 59formula
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
16/29
1>,1/ of the total area underthe normal curve corresponds tovalues of +,88 or greater
1>,1/ of differences betweenmeans of sample si2e - willhave L scores greater than +,88
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
17/29
From L bac' to Ag
• 5o" 1>,1/ of differences between pairs ofsamples of - drawn from the population of Acats will be +,-Ag or larger
• #his is the same as sa ing the probabilit of a
single comparison producing a difference of +,-Agor greater is 1>,1/
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
18/29
5D1 )
&hat if the population 5D .H0 is un'nownK
• suall " researchers onl have two samples tocompare" and the population parameters areun'nown,
• !n this situation the sample 5D is used instead ofthe population 5D" and the 59 formula is modified
591 I
5D) )
)
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
19/29
&eights of ;ritish cats.Ag0
ean 5D 59
5mall samplesi2e . -0
4,1 +,- +,)=
4,8 +,> +,)8
-,) 1,1 +,48=,> +,? +,=+
• For the highlightedpair of samples themean difference is+,-Ag
• #he sample 5DMs willbe used in themodified formulainstead of the
un'nown population5D
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
20/29
+,- )
Converting the difference between ) meansto a L score when H is un'nown
-I +,?
)
-
L+,-
1,)8+,-
+,=@
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
21/29
ow much evidence is there against the nullh pothesisK
• 8,@/ of L statistics are ( 1,)8" so we would not concludethat the two samples of cats are from different countries ifwe used the -/ cut off
• !n this e7ample" we 'now that the two samples were fromthe same population" so we can verif that this was thecorrect conclusion
• Gn the other hand" if two samples had a mean differenceof +,@Ag" then assuming the sample 5DMs remain thesame" the resulting L statistic would be ),+?
• Gnl 1,8/ of L statistics are greater than ),+?" and if wedidnMtknow that the two samples came from the samepopulation we would re ect the null h pothesis" and bdoing so commit a # pe ! error
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
22/29
&eights of ;ritish cats.Ag0
&eights of Bree' cats.Ag0
ean 5D 59 ean 5D 595mall samplesi2e . -0
4.1 0.5 0.23 =,8 +,1 +,+>
4,8 +,> +,)8 4,4 +,- +,)1
-,) 1,1 +,48 3.7 0.3 0.13
6argesample si2e. 1)0
4,1 +,@ +,)= 4.1 0.2 0.07
4.6 0.6 0.18 =,8 +,4 +,1+
4,? +,- +,1- =,@ +,= +,1+
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
23/29
+,- )
#he L score of the difference betweensamples of - A and - Bree' cats
-I +,=
)
-
L4,1 % =,?
1,-=+,4
+,)>
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
24/29
ow much evidence is there against the nullh pothesisK
•>,=/ of L statistics are ( 1,-=" so we would be unable toconclude that the two samples of cats are from differentcountries if we used the -/ cut off
• !n this e7ample we 'now that the two samples were fromdifferent populations" so we have committed a # pe !! errorb failing to re ect the null h pothesis
• # pe !! errors li'e this are common when the sample si2eis small
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
25/29
+,> )
#he L score of the difference betweensamples of 1) A and 1) Bree' cats
1)I +,)
)
1)
L4,> % 4,1
),?=+,-
+,1@
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
26/29
ow much evidence is there against the nullh pothesisK
•+,+=)/ of L statistics are ( ),?=" so we would beconclude that the two samples of cats are from differentcountries if we used the -/ cut off
• !n this e7ample we 'now that the two samples were fromdifferent populations" so we have correctl re ected the nullh pothesis
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
27/29
!mportant caveat• &hat ! have described toda is called a NL testO• ;ut" the formula for estimating the 59 of the difference
between ) means used in the L test is onl accurate whenthe individual sample si2es are =+ or more % #his is because the estimate of the population 5D is not accurate
• #here is a different test that uses an accurate estimate ofthe 59 when sample si2e is less than =+ % the Nt testO" which is covered in the ne7t lecture
• ;ecause the t test produces the same results as the L test
when the sample si2e is (=+ computer programs li'e5P55 generall onl give the option of a t test
• ;oth tests wor' on the same principle" but the L test is lesscomplicated and easier to understand
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
28/29
Beneral principle of test statistics
test statisticvariation in the D due to the !
other variation in the data .error0
• Ell test statistics have 'nown probabilit distributions whenvariation in the D due to the ! is 2ero .i,e, the null h p istrue0
• L has the distribution of the standard normal distribution• Gther test statistics have different shaped distributions"
and different calculation formulas" but the general principlefor converting the test statistic to a p value is the same,
-
8/15/2019 PY1PR1 Stats Lecture 4 Handout
29/29
6ist of statistical terms for revision
• #his lecture made use of terms introduced inprevious lectures" and onl introduced one newterm % sampling distribution of the difference between two
means