![Page 1: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/1.jpg)
Statistical Inference
![Page 2: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/2.jpg)
R packages used in this chapter
library(ggplot2)
library(dplyr)
##
## Attaching package: ’dplyr’
## The following objects are masked from ’package:stats’:
##
## filter, lag
## The following objects are masked from ’package:base’:
##
## intersect, setdiff, setequal, union
2 / 166
![Page 3: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/3.jpg)
Statistical Inference and Science
I Previously: descriptive statistics. “Here are data; what do they say?”.
I May need to take some action based on information in data.
I Or want to generalize beyond data (sample) to larger world(population).
I Science: first guess about how world works.
I Then collect data, by sampling.
I Is guess correct (based on data) for whole world, or not?
3 / 166
![Page 4: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/4.jpg)
Sample data are imperfectI Sample data never entirely represent what you’re observing.
I There is always random error present.
I Thus you can never be entirely certain about your conclusions.
I The Toronto Blue Jays’ average home attendance in part of 2015season was 25,070 (up to May 27 2015, frombaseball-reference.com).
I Does that mean the attendance at every game was exactly 25,070?Certainly not. Actual attendance depends on many things, eg.:
I how well the Jays are playing
I the opposition
I day of week
I weather
I random chance4 / 166
![Page 5: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/5.jpg)
Summarizing the attendances
jays=read.csv("jays15-home.csv",header=T)
summary(jays$attendance)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 14180 16400 21200 25070 33090 48410
Attendances varied a lot, from a low of 14180 to a high of 48410 (openingday).
Boxplot over. I used “base graphics” (not ggplot) for this one.
5 / 166
![Page 6: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/6.jpg)
Attendance boxplotboxplot(jays$attendance)
1500
025
000
3500
045
000
6 / 166
![Page 7: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/7.jpg)
Comments
I Distribution somewhat skewed to right (but no outliers).
I Attendances have substantial variability.
I These are a sample of “all possible games” (or maybe “all possiblegames played in April and May”). What can we say about meanattendance in all possible games based on this evidence?
I Think about:
I Confidence interval
I Hypothesis test.
7 / 166
![Page 8: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/8.jpg)
Confidence interval for mean
I Should account for:
I we only observed 25 out of all possible games (sample size)
I attendances we did observe were highly variable (sample SD)
I we observed mean attendance of 25,070 (sample mean).
I Confidence level: how likely we want the interval to be to contain themean attendance at “all possible games”. Typically 95%.
I Formula, with sample mean x̄ , sample SD s and sample size n:
x̄ ± t∗s√n.
t∗ is number from t-table that accounts for confidence level (eg.95%) and sample size through degrees of freedom n − 1.
I For 25− 1 = 24 df and 95% confidence, t∗ = 2.064.
8 / 166
![Page 9: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/9.jpg)
How do I get t∗ or z∗ for CI?I z∗ first, for 95% interval:
I Want middle 95%, z-values that enclose that.
I 2.5% less than −z∗, 97.5% less than z∗.
I In R, qnorm gives value of z with input probability less than it:
qnorm(0.025)
## [1] -1.959964
qnorm(0.975)
## [1] 1.959964
I So, for 95% z-confidence interval, z∗ = 1.96.
9 / 166
![Page 10: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/10.jpg)
t∗ for 95% confidence interval
I “Half the leftovers” again.
I Use qt rather than qnorm.
I Two things in qt: prob. of less, plus degrees of freedom.
I 95% t∗ with 24 df:
qt(0.975,24)
## [1] 2.063899
I or same with 10 df:
qt(0.975,10)
## [1] 2.228139
bigger, because fewer df.
I or t∗ for 90% with 10 df:
qt(0.95,10)
## [1] 1.812461
smaller, as you’d expect.
10 / 166
![Page 11: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/11.jpg)
CI for mean attendance
I First, use R as calculator.
I Sample mean and SD:
att=jays$attendance
mean(att)
## [1] 25070.16
sd(att)
## [1] 11006.69
I Work out “margin of error”(plus/minus part) first:
m=2.064*11007/sqrt(25)
m
## [1] 4543.69
I then get limits of CI:
xbar=25070
xbar-m
## [1] 20526.31
xbar+m
## [1] 29613.69
11 / 166
![Page 12: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/12.jpg)
Comments
I Interval from about 20,000 to about 30,000.
I Not estimating mean attendance well at all!
I Generally want confidence interval to be shorter, which happens if:
I SD smaller
I sample size bigger
I confidence level smaller
I Last one is a cheat, really, since reducing confidence level increaseschance that interval won’t contain pop. mean at all!
12 / 166
![Page 13: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/13.jpg)
Using t.test
I Foregoing illustrated how you would do it by hand.
I In practice, t.test function does it all:
t.test(att)
##
## One Sample t-test
##
## data: att
## t = 11.389, df = 24, p-value = 3.661e-11
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 20526.82 29613.50
## sample estimates:
## mean of x
## 25070.16
13 / 166
![Page 14: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/14.jpg)
Or, 90% CI
I by including a value for conf.level:
t.test(att,conf.level=0.90)
##
## One Sample t-test
##
## data: att
## t = 11.389, df = 24, p-value = 3.661e-11
## alternative hypothesis: true mean is not equal to 0
## 90 percent confidence interval:
## 21303.93 28836.39
## sample estimates:
## mean of x
## 25070.16
14 / 166
![Page 15: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/15.jpg)
Hypothesis testI CI answers question “what is the mean?”
I Might have a value µ in mind for the mean, and question “Is themean equal to µ, or not?”
I For example, 2014 average attendance was 29,327.
I Second question answered by hypothesis test.
I Value being assessed goes in null hypothesis: here, H0 : µ = 29, 327.
I Alternative hypothesis says how null might be wrong, eg.Ha : µ 6= 29, 327.
I Assess evidence against null. If that evidence strong enough, rejectnull hypothesis; if not, fail to reject null hypothesis (sometimes retainnull).
I Note asymmetry between null and alternative, and utter absence ofword “accept”.
15 / 166
![Page 16: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/16.jpg)
α and errorsI Hypothesis test ends with decision:
I reject null hypothesis
I do not reject null hypothesis.
I but decision may be wrong:
DecisionTruth Do not reject Reject null
Null true Correct Type I errorNull false Type II error Correct
I Either type of error is bad, but for now focus on controlling Type Ierror: write α = P(type I error), and devise test so that α small,typically 0.05.
I That is, if null hypothesis true, have only small chance to reject it(which would be a mistake).
I Worry about type II errors later (when we consider power of test).16 / 166
![Page 17: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/17.jpg)
Why 0.05? This man.Responsible for:
I analysis of variance
I Fisher information
I Linear discriminant analysis
I Fisher’s z-transformation
I Fisher-Yates shuffle
I Behrens-Fisher problem
Sir Ronald A. Fisher, 1890–1962.
17 / 166
![Page 18: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/18.jpg)
Why 0.05? (2)I From The Arrangement of Field Experiments (1926):
I and
18 / 166
![Page 19: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/19.jpg)
Test statistic: going from data to decision
I “How far away from null hypothesis is data?”
I In testing for mean, statistic is
t =x̄ − µs/√n
I and for our baseball attendance data
t.stat=(25070-29327)/(11000/sqrt(25))
t.stat
## [1] -1.935
19 / 166
![Page 20: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/20.jpg)
P-value
I The probability of observing a test statistic value as extreme or moreextreme than that observed, if the null hypothesis is true.
I “More extreme” depends on Ha: count both sides if two-sided, countonly the proper side if one-sided.
I Our Ha was Ha : µ 6= 29, 327, two-sided.
20 / 166
![Page 21: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/21.jpg)
t-tableI Look up 1.935 not −1.935.
I 1.71 < 1.935 < 2.06
I P-value between 0.05 and 0.10
I Or, exactly (×2: 2-sided):
2*pt(-1.93,24)
## [1] 0.06550769
I P-value not less than 0.05: donot reject null.
I No evidence of change in meanattendance.
21 / 166
![Page 22: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/22.jpg)
Or, again, using t.test:
t.test(att, mu=29327)
##
## One Sample t-test
##
## data: att
## t = -1.9338, df = 24, p-value = 0.06502
## alternative hypothesis: true mean is not equal to 29327
## 95 percent confidence interval:
## 20526.82 29613.50
## sample estimates:
## mean of x
## 25070.16
I See test statistic −1.93, P-value 0.065.
I Again, do not reject null.
22 / 166
![Page 23: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/23.jpg)
And in SAS
I created a spreadsheet with only attendances (since there were otherwise alot of variables). The first line is a header, so start reading data on second.
data jays;
infile '/folders/myfolders/jays15-home-att.csv'
firstobs=2;
input attendance;
proc ttest h0=29327;
var attendance;
23 / 166
![Page 24: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/24.jpg)
Output
N Mean Std Dev Std Err Minimum Maximum
25 25070.2 11006.7 2201.3 14184.0 48414.0
Mean 95% CL Mean Std Dev 95% CL Std Dev
25070.2 20526.8 29613.5 11006.7 8594.3 15312.0
DF t Value Pr > |t|
24 -1.93 0.0650
Same CI (20527 to 29614) as R, also same P-value 0.0650.
24 / 166
![Page 25: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/25.jpg)
SAS, the long wayI To work from the original spreadsheet, we need to read in all the
variables:
data jays;
infile '/folders/myfolders/jays15-home.csv'
dlm=',' dsd firstobs=2;
input row game date $ box $ team $ venue $ opp $
result $ runs oppruns innings wl $ position gb $
winner $ loser $ save $ gametime $ daynight $
attendance streak $;
proc print;
I Yes, that is one long input line, split.
I dlm=’,’ means data values separated by commas.
I dsd means treat two consecutive delimiters as a missing value(usually what you want for delimited data). 25 / 166
![Page 26: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/26.jpg)
Results (some, tiny)Obs row game date box team venue opp result runs oppruns innings wl
1 82 7 Monday, boxscore TOR TBR L 1 2 . 4-3
2 83 8 Tuesday, boxscore TOR TBR L 2 3 . 4-4
3 84 9 Wednesda boxscore TOR TBR W 12 7 . 5-4
4 85 10 Thursday boxscore TOR TBR L 2 4 . 5-5
5 86 11 Friday, boxscore TOR ATL L 7 8 . 5-6
6 87 12 Saturday boxscore TOR ATL W-wo 6 5 10 6-6
7 88 13 Sunday, boxscore TOR ATL L 2 5 . 6-7
8 89 14 Tuesday, boxscore TOR BAL W 13 6 . 7-7
9 90 15 Wednesda boxscore TOR BAL W 4 2 . 8-7
10 91 16 Thursday boxscore TOR BAL W 7 6 . 9-7
11 92 27 Monday, boxscore TOR NYY W 3 1 . 13-14
12 93 28 Tuesday, boxscore TOR NYY L 3 6 . 13-15
13 94 29 Wednesda boxscore TOR NYY W 5 1 . 14-15
14 95 30 Friday, boxscore TOR BOS W 7 0 . 15-15
15 96 31 Saturday boxscore TOR BOS W 7 1 . 16-15
16 97 32 Sunday, boxscore TOR BOS L 3 6 . 16-16
17 98 40 Monday, boxscore TOR LAA W 10 6 . 18-22
18 99 41 Tuesday, boxscore TOR LAA L 2 3 . 18-23
19 100 42 Wednesda boxscore TOR LAA L 3 4 . 18-24
20 101 43 Thursday boxscore TOR LAA W 8 4 . 19-24
21 102 44 Friday, boxscore TOR SEA L 3 4 . 19-25
22 103 45 Saturday boxscore TOR SEA L 2 3 . 19-26
23 104 46 Sunday, boxscore TOR SEA W 8 2 . 20-26
24 105 47 Monday, boxscore TOR CHW W 6 0 . 21-26
25 106 48 Tuesday, boxscore TOR CHW W-wo 10 9 . 22-26
Obs position gb winner loser save gametime daynight attendance streak
1 2 1 Odorizzi Dickey Boxberge 2:30 N 48414 -
2 3 2 Geltz Castro Jepsen 3:06 N 17264 --
3 2 1 Buehrle Ramirez 3:02 N 15086 +
4 4 1.5 Archer Sanchez Boxberge 3:00 N 14433 -
5 4 2.5 Martin Cecil Grilli 3:09 N 21397 --
6 3 1.5 Cecil Marimon 2:41 D 34743 +
7 4 1.5 Miller Norris Grilli 2:41 D 44794 -
8 2 2 Buehrle Norris 2:53 N 14184 +
9 2 1 Sanchez Jimenez Castro 2:36 N 15606 ++
10 1 Tied Hutchiso Tillman Castro 2:36 N 18581 +++
11 4 3.5 Dickey Martin Cecil 2:18 N 19217 +
12 5 4.5 Pineda Estrada Miller 2:54 N 21519 -
13 3 3.5 Buehrle Sabathia 2:30 N 21312 +
14 3 4 Sanchez Miley 2:41 N 30430 ++
15 3 3 Hutchiso Kelly 3:12 D 42917 +++
16 3 4 Buchholz Dickey Uehara 2:38 D 42419 -
17 5 4.5 Osuna Morin 3:28 D 29306 +
18 5 4.5 Santiago Sanchez Street 2:32 N 15062 -
19 5 4.5 Weaver Hutchiso Street 2:36 N 16402 --
20 5 4.5 Dickey Shoemake 2:22 N 19014 +
21 5 5.5 Hernande Estrada Rodney 2:30 N 21195 -
22 5 5.5 Paxton Buehrle Rodney 2:25 D 33086 --
23 5 4.5 Sanchez Walker 2:50 D 37929 +
24 5 3.5 Hutchiso Noesi 2:10 N 15168 ++
25 4 3 Delabar Robertso 3:12 N 17276 +++
26 / 166
![Page 27: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/27.jpg)
Results (a few, but less tiny)proc print;
var date attendance daynight;
Obs date attendance daynight
1 Monday, 48414 N
2 Tuesday, 17264 N
3 Wednesda 15086 N
4 Thursday 14433 N
5 Friday, 21397 N
6 Saturday 34743 D
7 Sunday, 44794 D
8 Tuesday, 14184 N
9 Wednesda 15606 N
10 Thursday 18581 N
11 Monday, 19217 N
12 Tuesday, 21519 N
13 Wednesda 21312 N
14 Friday, 30430 N
15 Saturday 42917 D
16 Sunday, 42419 D
17 Monday, 29306 D
18 Tuesday, 15062 N
19 Wednesda 16402 N
20 Thursday 19014 N
21 Friday, 21195 N
22 Saturday 33086 D
23 Sunday, 37929 D
24 Monday, 15168 N
25 Tuesday, 17276 N
27 / 166
![Page 28: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/28.jpg)
Comments
I Need to know names and types for all variables to read in data (vs. R,just read in data frame).
I For text variables, just first 8 characters read in. (To change, seeinformat later.)
I daynight is D for a day game and N for a night game. How doattendances compare for these?
proc sgplot;
vbox attendance / category=daynight;
28 / 166
![Page 29: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/29.jpg)
The boxplot
29 / 166
![Page 30: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/30.jpg)
Comments
I Attendances on average much higher for day games than night ones.
I Why?
I What is that upper outlier in the night games?
30 / 166
![Page 31: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/31.jpg)
Another example: learning to read
I You devised new method for teaching children to read.
I Guess it will be more effective than current methods.
I To support this guess, collect data.
I Want to generalize to “all children in Canada”.
I So take random sample of all children in Canada.
I Or, argue that sample you actually have is “typical” of all children inCanada.
I Randomization: whether or not a child in sample or not has nothingto do with anything else about that child.
I Aside: if your new method good for teaching struggling children toread, then “all kids” is “all kids having trouble learning to read”, andyou take a sample of those.
31 / 166
![Page 32: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/32.jpg)
The data (some), in SASI Data in file drp.txt with header line, group then reading test score.
data kids;
infile '/folders/myfolders/drp.txt' firstobs=2;
input group $ score;
proc print;
Obs group score
1 t 24
2 t 61
3 t 59
4 t 46
5 t 43
6 t 44
7 t 52
8 t 43
9 t 58
10 t 67
11 t 62
12 t 57
13 t 71
14 t 49
15 t 54
16 t 43
17 t 53
18 t 57
19 t 49
20 t 56
21 t 33
22 c 42
23 c 33
24 c 46
25 c 37
26 c 43
27 c 41
28 c 10
29 c 42
30 c 55
31 c 19
32 c 17
33 c 55
34 c 26
35 c 54
36 c 60
37 c 28
38 c 62
39 c 20
40 c 53
41 c 48
42 c 37
43 c 85
44 c 42
32 / 166
![Page 33: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/33.jpg)
Analysis
I Groups labelled c for “control” and t for “treatment”.
I Start with summaries (group means) and plot (boxplot).
I No pairing, matching: might compare means with two-sample t-test.
I For test, need approx. normality, but don’t need equal variability.
I Use summaries to decide if test reasonable.
33 / 166
![Page 34: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/34.jpg)
Comparing means
proc means;
class group;
var score;
The MEANS Procedure
Analysis Variable : score
N
group Obs N Mean Std Dev Minimum Maximum
-------------------------------------------------------------------------------
c 23 23 41.5217391 17.1487332 10.0000000 85.0000000
t 21 21 51.4761905 11.0073568 24.0000000 71.0000000
-------------------------------------------------------------------------------
34 / 166
![Page 35: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/35.jpg)
Boxplotsproc sgplot;
vbox score / category=group;
35 / 166
![Page 36: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/36.jpg)
Comments
I Groups not actually same size (maybe 2 kids had to drop out).
I Means a fair bit different, treatment mean higher.
I But a lot of variability, so groups do overlap.
I Standard deviations somewhat different too.
I Biggest threat to normality is outliers, none here.
I Both distributions not far off symmetric.
I t-test should be good enough.
36 / 166
![Page 37: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/37.jpg)
The t-test
proc ttest side=l;
var score;
class group;
The TTEST Procedure
Variable: score
Method Variances DF t Value Pr < t
Pooled Equal 42 -2.27 0.0143
Satterthwaite Unequal 37.855 -2.31 0.0132
plus a lot more output.
37 / 166
![Page 38: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/38.jpg)
Comments and Conclusions
I One-sided test (looking for improvement). side can be l (lower), u(upper) or 2 (two-sided, can be omitted.) This is l because controlgroup first in alphabetical order.
I Right t-test is Satterthwaite (does not assume equal variability)
I P-value 0.0132 < 0.05: there is increase in reading scores.
I Should not use pooled test, because SDs not close; even so, resultvery similar (P-value 0.0143).
I One-sided test doesn’t give (regular) CI for difference in means. Toget that, repeat analysis without side=l.
38 / 166
![Page 39: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/39.jpg)
Doing it with RI I skip the boxplots (you do it).
I c (control) before t (treatment) alphabetically, so proper alternativeis “less”.
I R does Satterthwaite test by default (as before: new reading programreally helps):
kids=read.table("drp.txt",header=T)
t.test(score~group,data=kids,alternative="less")
##
## Welch Two Sample t-test
##
## data: score by group
## t = -2.3109, df = 37.855, p-value = 0.01319
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -2.691293
## sample estimates:
## mean in group c mean in group t
## 41.52174 51.47619 39 / 166
![Page 40: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/40.jpg)
The pooled t-testI Pooled (equal variances) test this way:
t.test(score~group,data=kids,alternative="less",
var.equal=T)
##
## Two Sample t-test
##
## data: score by group
## t = -2.2666, df = 42, p-value = 0.01431
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -2.567497
## sample estimates:
## mean in group c mean in group t
## 41.52174 51.47619
I Similar P-value to Satterthwaite test (and same results as SAS).40 / 166
![Page 41: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/41.jpg)
Two-sided test; CI
I To do 2-sided test, leave out alternative:
t.test(score~group,data=kids)
##
## Welch Two Sample t-test
##
## data: score by group
## t = -2.3109, df = 37.855, p-value = 0.02638
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.67588 -1.23302
## sample estimates:
## mean in group c mean in group t
## 41.52174 51.47619
I Also gives CI: new reading program increases average scores bysomewhere between about 1 and 19 points.
41 / 166
![Page 42: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/42.jpg)
Logic of testingI Work out what would happen if null hypothesis were true.
I Compare to what actually did happen.
I If these are too far apart, conclude that null hypothesis is not trueafter all. (Be guided by P-value.)
As applied to our reading programs:
I If reading programs equally good, expect to see a difference in meansclose to 0.
I Closeness quantified eg. by t.
I Mean reading score was 10 higher for new program.
I Difference of 10 was unusually big (P-value small from t-test andrandomization test). So conclude that new reading program iseffective.
Nothing here about what happens if null hypothesis is false. This is powerand type II error probability. 42 / 166
![Page 43: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/43.jpg)
Errors in testing
What can happen:
DecisionTruth Do not reject Reject null
Null true Correct Type I errorNull false Type II error Correct
Tension between truth and decision about truth (imperfect).
I Prob. of type I error denoted α. Usually fix α, eg. α = 0.05.
I Prob. of type II error denoted β. Determined by the plannedexperiment. Low β good.
I Prob. of not making type II error called power (= 1− β). High powergood.
43 / 166
![Page 44: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/44.jpg)
Power
I Suppose H0 : θ = 10, Ha : θ 6= 10 for some parameter θ.
I Suppose H0 wrong. What does that say about θ?
I Not much. Could have θ = 11 or θ = 8 or θ = 496. In each case, H0
wrong.
I How likely a type II error is depends on what θ is:
I If θ = 496, should be able to reject H0 : θ = 10 even for small sample,so β should be small (power large).
I If θ = 11, might have hard time rejecting H0 even with large sample, soβ would be larger (power smaller).
I Power depends on true parameter value, and on sample size.
I So we play “what if”: “if θ were 11 (or 8 or 496), what would powerbe?”.
44 / 166
![Page 45: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/45.jpg)
Figuring out power
I Time to figure out power is before you collect any data, as part ofplanning process.
I Need to have idea of what kind of departure from null hypothesis ofinterest to you, eg. average improvement of 5 points on reading testscores. (Subject-matter decision, not statistical one.)
I Then, either:
I “I have this big a sample and this big a departure I want to detect.What is my power for detecting it?”
I “I want to detect this big a departure with this much power. How big asample size do I need?”
I R or SAS.
45 / 166
![Page 46: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/46.jpg)
Calculating power for a t-test
Think about reading programs again. Need to specify:
I What kind of t-test (here 2-sample, not 1-sample or paired)
I What kind of Ha (here one-sided)
I What the population SD is (usually have to guess this). Here thesample SDs were 11 and 17, so 15 seems a fair guess (same for eachgroup).
I What size departure from null (here 5 points).
46 / 166
![Page 47: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/47.jpg)
22 kids/group, difference 5
power.t.test(n=22,delta=5,sd=15,type="two.sample",
alternative="one.sided")
##
## Two-sample t test power calculation
##
## n = 22
## delta = 5
## sd = 15
## sig.level = 0.05
## power = 0.2887273
## alternative = one.sided
##
## NOTE: n is number in *each* group
This power (29%) way too small.
47 / 166
![Page 48: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/48.jpg)
Detecting a larger differenceLarger difference, say 10 points under same conditions, should be easier todetect. Change delta=5 to delta=10:
power.t.test(n=22,delta=10,sd=15,type="two.sample",
alternative="one.sided")
##
## Two-sample t test power calculation
##
## n = 22
## delta = 10
## sd = 15
## sig.level = 0.05
## power = 0.7020779
## alternative = one.sided
##
## NOTE: n is number in *each* group
Much better, power around 70%. Much likelier to detect this biggerdifference. 48 / 166
![Page 49: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/49.jpg)
Increasing the sample sizeMaking n bigger should improve power. For detecting difference of 5,increase sample size from 22 in each group to 40:
power.t.test(n=40,delta=5,sd=15,type="two.sample",alternative="one.sided")
##
## Two-sample t test power calculation
##
## n = 40
## delta = 5
## sd = 15
## sig.level = 0.05
## power = 0.4336523
## alternative = one.sided
##
## NOTE: n is number in *each* group
Power went up from 29% to 43%. Often need much bigger sample size tomake much difference.
49 / 166
![Page 50: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/50.jpg)
Making a graph of power
I “Power curve”.
I Two ways:
I Keep sample size fixed, change true difference between means
I Keep difference fixed, change sample size.
I Feed power.t.test a vector of true differences or sample sizes. Getback vector of power values:
my.del=seq(0,20,2)
mypower=power.t.test(n=22,delta=my.del,sd=15,
type="two.sample",alternative="one.sided")
mypower$power
## [1] 0.0500000 0.1131924 0.2192775 0.3670836 0.5380092 0.7020779 0.8328057
## [8] 0.9192728 0.9667502 0.9883915 0.9965808
50 / 166
![Page 51: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/51.jpg)
Fixing up “list”; making scatterplot
I To plot the power and my.del values, we need to make a data frameof them:
power.df=data.frame(diff=my.del,power=mypower$power)
I I called the difference in means diff to remind me what it was.
I Our plot will be a scatterplot. Set up ggplot in the usual way, then:
I geom_point() will plot the x-points against the y -points.
I geom_line() joins points by lines (helpful here).
51 / 166
![Page 52: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/52.jpg)
The power curveggplot(power.df,aes(x=diff,y=power))+geom_point()+geom_line()
●
●
●
●
●
●
●
●
●● ●
0.25
0.50
0.75
1.00
0 5 10 15 20diff
pow
er
52 / 166
![Page 53: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/53.jpg)
Embellishing the power curve
I The largest the power can be is 1 (the test correctly rejects the nullevery time).
I If two means actually equal (diff is 0), the null hypothesis actuallycorrect, and then the prob of rejecting it is 0.05. So power should be0.05 at diff 0, bigger elsewhere.
I Add these as horizontal lines to graph.
I Uses geom_hline(yintercept=y).
I Construct graph first, then print (over):
gr=ggplot(power.df,aes(x=diff,y=power))+
geom_point()+geom_line()+
geom_hline(yintercept=0.05,linetype="dashed")+
geom_hline(yintercept=1,linetype="dashed")
53 / 166
![Page 54: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/54.jpg)
The improved power curvegr
●
●
●
●
●
●
●
●
●● ●
0.25
0.50
0.75
1.00
0 5 10 15 20diff
pow
er
54 / 166
![Page 55: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/55.jpg)
Power curve for sample size, fixed difference of 5
Sample sizes, 10 to 100 in steps of 5:
my.n=seq(10,100,5)
mypower=power.t.test(n=my.n,delta=5,sd=15,
type="two.sample",alternative="one.sided")
mypower$power
## [1] 0.1768868 0.2254334 0.2710950 0.3145661 0.3560927 0.3957733 0.4336523
## [8] 0.4697559 0.5041065 0.5367294 0.5676549 0.5969194 0.6245648 0.6506381
## [15] 0.6751905 0.6982766 0.7199535 0.7402799 0.7593159
55 / 166
![Page 56: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/56.jpg)
Setting up the power curve
my.df=data.frame(n=my.n,power=mypower$power)
gr=ggplot(my.df,aes(x=n,y=power))+
geom_point()+geom_line()+
geom_hline(yintercept=0.05,linetype="dashed")+
geom_hline(yintercept=1,linetype="dashed")
56 / 166
![Page 57: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/57.jpg)
The power curvegr
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
0.25
0.50
0.75
1.00
25 50 75 100n
pow
er
57 / 166
![Page 58: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/58.jpg)
Finding sample sizeI Instead of being given a sample size and finding a power, we can be
given a power and asked for the sample size needed to achieve it.
I Eg. to detect a difference of 5 between means, how big a sample do Ineed to get 80% power?
I Leave out n= and put in power=0.80 instead; solves for n:
power.t.test(power=0.80,delta=5,sd=15,type="two.sample",
alternative="one.sided")
##
## Two-sample t test power calculation
##
## n = 111.9686
## delta = 5
## sd = 15
## sig.level = 0.05
## power = 0.8
## alternative = one.sided
##
## NOTE: n is number in *each* group 58 / 166
![Page 59: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/59.jpg)
Comments
I To have 80% chance of correctly detecting difference of 5 in means,need 112 children in each group.
I Sample size calculations apt to give depressing results!
59 / 166
![Page 60: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/60.jpg)
SAS
SAS has proc power which does all the same stuff (a little more flexibly):
I Two-sample t for means: twosamplemeans.
I Satterthwaite test: diff satt
I One-sided.
I Desired difference in means 5.
I SD in each group 15
I Total sample size 44 (22 in each group)
I Want to find power for this: power=..
60 / 166
![Page 61: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/61.jpg)
SAS code
proc power;
twosamplemeans
test=diff_satt
sides=1
meandiff=5
stddev=15
ntotal=44
power=.;
Note punctuation: twosamplemeans to power really all one line, sosemicolon only after power=..
61 / 166
![Page 62: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/62.jpg)
SAS outputThe POWER Procedure
Two-Sample t Test for Mean Difference with Unequal Variances
Fixed Scenario Elements
Distribution Normal
Method Exact
Number of Sides 1
Mean Difference 5
Standard Deviation 15
Total Sample Size 44
Null Difference 0
Nominal Alpha 0.05
Group 1 Weight 1
Group 2 Weight 1
Computed Power
Actual
Alpha Power
0.0498 0.288
Power is 0.29, as with R. Note that we didn’t need any data. 62 / 166
![Page 63: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/63.jpg)
Power for several alternatives and sample sizes
If you put several values on a line, SAS does all combinations of them, eg.mean differences 5 and 10, sample size 22 and 40 per group:
proc power;
twosamplemeans
test=diff_satt
sides=1
meandiff=5 10
stddev=15
ntotal=44 80
power=.;
63 / 166
![Page 64: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/64.jpg)
OutputThe POWER Procedure
Two-Sample t Test for Mean Difference with Unequal Variances
Fixed Scenario Elements
Distribution Normal
Method Exact
Number of Sides 1
Standard Deviation 15
Null Difference 0
Nominal Alpha 0.05
Group 1 Weight 1
Group 2 Weight 1
Computed Power
Mean N Actual
Index Diff Total Alpha Power
1 5 44 0.0498 0.288
2 5 80 0.0499 0.433
3 10 44 0.0498 0.701
4 10 80 0.0499 0.905
64 / 166
![Page 65: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/65.jpg)
Comments
I Consistent with R results before.
I Increasing sample size increases power a bit.
I Increasing true mean difference increases power dramatically.
65 / 166
![Page 66: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/66.jpg)
Sample size calculation
To calculate sample size required, leave ntotal missing (put in a dot) andfill in power. Here, mean difference 5, desired power 0.80:
proc power;
twosamplemeans
test=diff_satt
sides=1
meandiff=5
stddev=15
ntotal=.
power=0.80;
66 / 166
![Page 67: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/67.jpg)
ResultsThe POWER Procedure
Two-Sample t Test for Mean Difference with Unequal Variances
Fixed Scenario Elements
Distribution Normal
Method Exact
Number of Sides 1
Mean Difference 5
Standard Deviation 15
Nominal Power 0.8
Null Difference 0
Nominal Alpha 0.05
Group 1 Weight 1
Group 2 Weight 1
Computed N Total
Actual Actual N
Alpha Power Total
0.05 0.800 224
Need 224 children total, or 112 in each group. As R. 67 / 166
![Page 68: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/68.jpg)
Or, more accurately. . .I Didn’t actually have same-size groups or equal population SDs.
Assumed them for R.
I SAS will allow different values per group.
I We had sample sizes 21 and 23, sample SDs 11 and 17 (use aspopulation SDs).
I Unequal sample sizes usually decrease power, but smaller sample sizewith smaller SD actually better. Overall effect unclear.
I SAS: use groupstddevs and groupns, and vertical bars.
proc power;
twosamplemeans
test=diff_satt
sides=1
meandiff=5
groupstddevs=11|17
groupns=21|23
power=.;68 / 166
![Page 69: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/69.jpg)
ResultsThe POWER Procedure
Two-Sample t Test for Mean Difference with Unequal Variances
Fixed Scenario Elements
Distribution Normal
Method Exact
Number of Sides 1
Mean Difference 5
Group 1 Standard Deviation 11
Group 2 Standard Deviation 17
Group 1 Sample Size 21
Group 2 Sample Size 23
Null Difference 0
Nominal Alpha 0.05
Computed Power
Actual
Alpha Power
0.0499 0.309
Power actually went up a tiny bit. 69 / 166
![Page 70: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/70.jpg)
Unequal sample sizes
To show effect of unequal sample sizes, go back to SDs both being 15, buthave very unequal sample sizes. What effect does that have on power?
proc power;
twosamplemeans
test=diff_satt
sides=1
meandiff=5
stddev=15
groupns=10|34
power=.;
70 / 166
![Page 71: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/71.jpg)
ResultsPower for 22 in each group was 29%:
The POWER Procedure
Two-Sample t Test for Mean Difference with Unequal Variances
Fixed Scenario Elements
Distribution Normal
Method Exact
Number of Sides 1
Mean Difference 5
Standard Deviation 15
Group 1 Sample Size 10
Group 2 Sample Size 34
Null Difference 0
Nominal Alpha 0.05
Computed Power
Actual
Alpha Power
0.0505 0.225
Unequal sample sizes bring power down to 22.5%.71 / 166
![Page 72: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/72.jpg)
Return to actual data: CI in SAS. . . For difference between mean reading scores:
proc ttest;
var score;
class group;
The TTEST Procedure
Variable: score
group Method Mean 95% CL Mean Std Dev
c 41.5217 34.1061 48.9374 17.1487
t 51.4762 46.4657 56.4867 11.0074
Diff (1-2) Pooled -9.9545 -18.8176 -1.0913 14.5512
Diff (1-2) Satterthwaite -9.9545 -18.6759 -1.2330
group Method 95% CL Std Dev
c 13.2627 24.2715
t 8.4213 15.8954
Diff (1-2) Pooled 11.9981 18.4947
Diff (1-2) Satterthwaite72 / 166
![Page 73: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/73.jpg)
Comments
I Same code as for test, look at different part of output.
I Look under 95% CL mean and Satterthwaite line, interval from−18.68 to −1.23.
I Control minus treatment, so new reading method better on averageby between 1 and 19 points.
I Would a 90% CI be longer or shorter than a 95% one? Code nextpage. Put appropriate alpha on proc ttest line.
73 / 166
![Page 74: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/74.jpg)
90% CIproc ttest alpha=0.10;
var score;
class group;
The TTEST Procedure
Variable: score
group Method Mean 90% CL Mean Std Dev
c 41.5217 35.3816 47.6618 17.1487
t 51.4762 47.3334 55.6190 11.0074
Diff (1-2) Pooled -9.9545 -17.3414 -2.5675 14.5512
Diff (1-2) Satterthwaite -9.9545 -17.2176 -2.6913
group Method 90% CL Std Dev
c 13.8098 22.8992
t 8.7834 14.9440
Diff (1-2) Pooled 12.3693 17.7758
Diff (1-2) Satterthwaite
74 / 166
![Page 75: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/75.jpg)
Comparison
I 95%: −18.68 to −1.23.
I 90%: −17.22 to −2.69.
I 90% CI shorter, but has more chance to be wrong.
I Both CIs wide: we don’t know much about effect that new readingmethod has. (Solution: more data.)
75 / 166
![Page 76: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/76.jpg)
Same thing in R
Default is 95% CI, get via t.test as before:
t.test(score~group,data=kids)
##
## Welch Two Sample t-test
##
## data: score by group
## t = -2.3109, df = 37.855, p-value = 0.02638
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.67588 -1.23302
## sample estimates:
## mean in group c mean in group t
## 41.52174 51.47619
Same as SAS.
76 / 166
![Page 77: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/77.jpg)
90% CI in R
Add conf.level, which is actual confidence level, unlike SAS:
t.test(score~group,data=kids,conf.level=0.90)
##
## Welch Two Sample t-test
##
## data: score by group
## t = -2.3109, df = 37.855, p-value = 0.02638
## alternative hypothesis: true difference in means is not equal to 0
## 90 percent confidence interval:
## -17.217609 -2.691293
## sample estimates:
## mean in group c mean in group t
## 41.52174 51.47619
Again, same as SAS.
77 / 166
![Page 78: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/78.jpg)
Duality between confidence intervals and hypothesis tests
I Tests and CIs really do the same thing, if you look at them the rightway. They are both telling you something about a parameter, andthey use same things about data.
I Illustrate with R and SAS, R first.
78 / 166
![Page 79: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/79.jpg)
95% CI (default)
group1=c(10,11,11,13,13,14,14,15,16)
group2=c(13,13,14,17,18,19)
t.test(group1,group2)
##
## Welch Two Sample t-test
##
## data: group1 and group2
## t = -2.0937, df = 8.7104, p-value = 0.0668
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5.5625675 0.2292342
## sample estimates:
## mean of x mean of y
## 13.00000 15.66667
79 / 166
![Page 80: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/80.jpg)
90% CI
t.test(group1,group2,conf.level=0.90)
##
## Welch Two Sample t-test
##
## data: group1 and group2
## t = -2.0937, df = 8.7104, p-value = 0.0668
## alternative hypothesis: true difference in means is not equal to 0
## 90 percent confidence interval:
## -5.010308 -0.323025
## sample estimates:
## mean of x mean of y
## 13.00000 15.66667
80 / 166
![Page 81: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/81.jpg)
Comparing results
Recall null here is H0 : µ1 − µ2 = 0. P-value 0.0668.
I 95% CI from −5.6 to 0.2, includes 0.
I 90% CI from −5.0 to −0.3, does not include 0.
I At α = 0.05, would not reject H0 since P-value > 0.05.
I At α = 0.10, would reject H0 since P-value < 0.10.
Not just coincidence. Following always true.
I Reject H0 at level α if and only if H0 value outside corresponding CI.
I Fail to reject H0 at level α if and only if H0 value inside correspondingCI.
I Idea: “Plausible” parameter value inside CI, not rejected;“Implausible” parameter value outside CI, rejected.
81 / 166
![Page 82: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/82.jpg)
In SAS
I Some trickery involved because of data layout.
I Put all the data values in one column, with second column labellingwith group they are from:
data dual;
infile '/folders/myfolders/duality.txt';
input y group;
proc print;
Output over.
82 / 166
![Page 83: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/83.jpg)
The data, small
Obs y group
1 10 1
2 11 1
3 11 1
4 13 1
5 13 1
6 14 1
7 14 1
8 15 1
9 16 1
10 13 2
11 13 2
12 14 2
13 17 2
14 18 2
15 19 2
83 / 166
![Page 84: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/84.jpg)
Test and CI at default α = 0.05proc ttest;
var y;
class group;
The TTEST Procedure
Variable: y
group Method Mean 95% CL Mean Std Dev
1 13.0000 11.4627 14.5373 2.0000
2 15.6667 12.8769 18.4564 2.6583
Diff (1-2) Pooled -2.6667 -5.2580 -0.0754 2.2758
Diff (1-2) Satterthwaite -2.6667 -5.5626 0.2292
group Method 95% CL Std Dev
1 1.3509 3.8315
2 1.6593 6.5198
Diff (1-2) Pooled 1.6499 3.6665
Diff (1-2) Satterthwaite
Method Variances DF t Value Pr > |t|
Pooled Equal 13 -2.22 0.0446
Satterthwaite Unequal 8.7104 -2.09 0.0668
84 / 166
![Page 85: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/85.jpg)
Getting 90% CIproc ttest alpha=0.10;
var y;
class group;
The TTEST Procedure
Variable: y
group Method Mean 90% CL Mean Std Dev
1 13.0000 11.7603 14.2397 2.0000
2 15.6667 13.4798 17.8535 2.6583
Diff (1-2) Pooled -2.6667 -4.7909 -0.5425 2.2758
Diff (1-2) Satterthwaite -2.6667 -5.0103 -0.3230
group Method 90% CL Std Dev
1 1.4365 3.4220
2 1.7865 5.5539
Diff (1-2) Pooled 1.7352 3.3806
Diff (1-2) Satterthwaite
Method Variances DF t Value Pr > |t|
Pooled Equal 13 -2.22 0.0446
Satterthwaite Unequal 8.7104 -2.09 0.0668
85 / 166
![Page 86: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/86.jpg)
The value of this
I If you have a test procedure but no corresponding CI:
I you make a CI by including all the parameter values that would notbe rejected by your test.
I Use:
I α = 0.01 for a 99% CI,
I α = 0.05 for a 95% CI,
I α = 0.10 for a 90% CI,
and so on.
86 / 166
![Page 87: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/87.jpg)
The sign test
I Test for population median.
I Data:
x=c(3, 4, 5, 5, 6, 8, 11, 14, 19, 37)
I Let M denote pop. median. Test H0 : M = 12 vs. Ha : M 6= 12.
I If H0 true, each data value either above or below 12, prob. 0.5 ofeach (ignore exactly 12’s).
I Test statistic: # values above 12.
I If H0 true, test statistic hasbinomial distribution,n = 10, p = 0.5.
I Observed 3 values above 12 (and7 below).
I P-value comes from prob. of 3successes or less in that binomialdistribution.
87 / 166
![Page 88: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/88.jpg)
Probability distribution calculatorI R knows about many distributions, eg. norm, binom.
I All have prefixes:
I d: probability (or density) of exactly given value
I p: probability of given value or less (CDF)
I q: value that has that much probability less (inverse CDF).
pnorm(1.96)
## [1] 0.9750021
qnorm(0.95)
## [1] 1.644854
dbinom(1,4,0.3)
## [1] 0.4116
pbinom(1,4,0.3)
## [1] 0.6517
qbinom(0.9,4,0.3)
## [1] 2
Standard normalProb of less than 1.96
Value with prob 0.95 less than it
Binomial, n = 4, p = 0.3Prob of exactly 1 success
Prob of 1 success or less
Value with prob (at least) 0.9 less than it
Check columns aligned ****
88 / 166
![Page 89: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/89.jpg)
Sign testI Data:
x
## [1] 3 4 5 5 6 8 11 14 19 37
I Hypotheses about median M: H0 : M = 12, Ha : M 6= 12.
I Test statistic: number of valuesabove 12.
table(x>12)
##
## FALSE TRUE
## 7 3
3 values above.
I P-value: prob. of 3 successes orless in binomial n = 10, p = 0.5,times 2 (two-sided test):
2*pbinom(3,10,0.5)
## [1] 0.34375
I No evidence that the median differs from 12.89 / 166
![Page 90: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/90.jpg)
Sign test in SAS
Lives in proc univariate. Note how null median specified:
data x;
input x @@;
cards;
3 4 5 5 6 8 11 14 19 37
;
proc univariate mu0=12;
90 / 166
![Page 91: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/91.jpg)
SAS sign test results
The UNIVARIATE Procedure
Variable: x
Tests for Location: Mu0=12
Test -Statistic- -----p Value------
Student's t t -0.24399 Pr > |t| 0.8127
Sign M -2 Pr >= |M| 0.3438
Signed Rank S -9.5 Pr >= |S| 0.3730
P-value 0.3438, same as R.
91 / 166
![Page 92: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/92.jpg)
Making R sign test into functionI Get CI for median as “what values of M would sign test not reject
for?”
I Trying lots of values of M, do sign test for each.
I Efficiency: write R function to do sign test and return P-value:
sign.test=function(med,mydata)
{tbl=table(mydata<med)
stat=min(tbl)
n=length(mydata)
pval=2*pbinom(stat,n,0.5)
return(pval)
}
I Function takes null median and data as input.
I Test statistic is #values greater or less than M, whichever is smaller.
I Returns two-sided P-value.92 / 166
![Page 93: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/93.jpg)
Using function to get CI
Idea: call function repeatedly on same data but different null M. Values ofM with P-value > 0.05 inside 95% CI, others outside.
First test on actual problem:
sign.test(12,x)
## [1] 0.34375
All right. Try some values, making sure not to use any data values:
sign.test(18.5,x)
## [1] 0.109375
sign.test(19.5,x)
## [1] 0.02148438
18.5 inside, 19.5 outside.
93 / 166
![Page 94: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/94.jpg)
Completing the interval
Lower end:
sign.test(4.5,x)
## [1] 0.109375
sign.test(3.5,x)
## [1] 0.02148438
4.5 inside, 3.5 outside.
So 95% CI for pop. median goes from 4.5 to 18.5.
Not a very good CI, for two reasons:
I Sample size small.
I Sign test not very powerful.
But it is a confidence interval.94 / 166
![Page 95: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/95.jpg)
Sign-testing a whole bunch of medians at once
options(width=60)
my.meds=seq(3.5,20.5,1)
pvals=sapply(my.meds,sign.test,x)
rbind(my.meds,pvals)
## [,1] [,2] [,3] [,4] [,5]
## my.meds 3.50000000 4.500000 5.5000000 6.500000 7.500000
## pvals 0.02148438 0.109375 0.7539063 1.246094 1.246094
## [,6] [,7] [,8] [,9] [,10]
## my.meds 8.5000000 9.5000000 10.5000000 11.50000 12.50000
## pvals 0.7539063 0.7539063 0.7539063 0.34375 0.34375
## [,11] [,12] [,13] [,14] [,15]
## my.meds 13.50000 14.500000 15.500000 16.500000 17.500000
## pvals 0.34375 0.109375 0.109375 0.109375 0.109375
## [,16] [,17] [,18]
## my.meds 18.500000 19.50000000 20.50000000
## pvals 0.109375 0.02148438 0.02148438
95 / 166
![Page 96: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/96.jpg)
Comments
I sapply says: for each of the values in first thing (my.meds), run thefunction named in the second thing (sign.test), feeding anythingelse (x) into the function.
I Runs sign.test(3.5,x), sign.test(4.5,x), . . . ,sign.test(20.5,x) and glues results together.
I P-value “1.25” is where 5 values above H0 median and 5 below, noevidence at all for rejecting null. (1 would be better value).
I For 95% CI, look at where P-values cross 0.05 (twice).
96 / 166
![Page 97: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/97.jpg)
Some more data
Take a look at these data (12 rows of 3 columns):
Case Drug A Drug B
1 2.0 3.5
2 3.6 5.7
3 2.6 2.9
4 2.6 2.4
5 7.3 9.9
6 3.4 3.3
7 14.9 16.7
8 6.6 6.0
9 2.3 3.8
10 2.0 4.0
11 6.8 9.1
12 8.5 20.9
97 / 166
![Page 98: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/98.jpg)
Matched pairsI Data are comparison of 2 drugs for effectiveness at reducing pain.
I 12 subjects (cases) were arthritis sufferers
I Response is #hours of pain relief from each drug.
I In reading example, each child tried only one reading method.
I But here, each subject tried out both drugs, giving us twomeasurements.
I Possible because, if you wait long enough, one drug has no influenceover effect of other.
I Advantage: focused comparison of drugs. Compare one drug withanother on same person, removes a lot of random variability.
I Matched pairs, requires different analysis.
I Design: randomly choose 6 of 12 subjects to get drug A first, other 6get drug B first.
98 / 166
![Page 99: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/99.jpg)
Analysis, in SAS
Uses a t-test. Note paired line.
data analgesic;
infile "/folders/myfolders/analgesic.txt";
input subject druga drugb;
proc ttest;
paired druga*drugb;
99 / 166
![Page 100: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/100.jpg)
Output of matched pairs t-test
N Mean Std Dev Std Err Minimum Maximum
12 -2.1333 3.4092 0.9841 -12.4000 0.6000
Mean 95% CL Mean Std Dev 95% CL Std Dev
-2.1333 -4.2994 0.0327 3.4092 2.4150 5.7884
DF t Value Pr > |t|
11 -2.17 0.0530
100 / 166
![Page 101: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/101.jpg)
Comments
I P-value 0.0530.
I At α = 0.05, cannot quite reject null of no difference, though result isvery close to significance.
I “Hand-calculation” way of doing this is to find the 12 differences, onefor each subject, and do 1-sample t-test on those differences. Shownon next page.
101 / 166
![Page 102: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/102.jpg)
Alternative way to do matched pairs
Define differences in data step:
data analgesic;
infile "/folders/myfolders/analgesic.txt";
input subject druga drugb;
diff=druga-drugb;
proc ttest h0=0;
var diff;
t-test is an ordinary 1-sample test on diff. Note that null-hypothesismean has to be given with only one sample.
102 / 166
![Page 103: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/103.jpg)
Results
N Mean Std Dev Std Err Minimum Maximum
12 -2.1333 3.4092 0.9841 -12.4000 0.6000
Mean 95% CL Mean Std Dev 95% CL Std Dev
-2.1333 -4.2994 0.0327 3.4092 2.4150 5.7884
DF t Value Pr > |t|
11 -2.17 0.0530
103 / 166
![Page 104: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/104.jpg)
Paired test in R (1)I forgot to give the variables names in the data file. Use what we have.
pain=read.table("analgesic.txt",header=F)
pain
## V1 V2 V3
## 1 1 2.0 3.5
## 2 2 3.6 5.7
## 3 3 2.6 2.9
## 4 4 2.6 2.4
## 5 5 7.3 9.9
## 6 6 3.4 3.3
## 7 7 14.9 16.7
## 8 8 6.6 6.0
## 9 9 2.3 3.8
## 10 10 2.0 4.0
## 11 11 6.8 9.1
## 12 12 8.5 20.9104 / 166
![Page 105: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/105.jpg)
Paired test in R (2)attach(pain)
t.test(V2,V3,paired=T)
##
## Paired t-test
##
## data: V2 and V3
## t = -2.1677, df = 11, p-value = 0.05299
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.29941513 0.03274847
## sample estimates:
## mean of the differences
## -2.133333
P-value as before. Likewise, you can calculate the differences yourself anddo a 1-sample t-test on them.
105 / 166
![Page 106: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/106.jpg)
Assessing normality
I Analyses assume (theoretically) that differences normally distributed.
I Though we know that t-tests generally behave well even withoutnormality.
I How to assess normality? A normal quantile plot.
I Idea: scatter of points should follow the straight line, without curving.
I Outliers show up at bottom left or top right of plot as points off theline.
106 / 166
![Page 107: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/107.jpg)
The normal quantile plotdiff=V2-V3 ; qqnorm(diff) ; qqline(diff)
●●
●●
●
●
●
●
●●●
●
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
−12
−8
−4
0
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Points should follow the straight line. Bottom left one way off, sonormality questionable here.
107 / 166
![Page 108: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/108.jpg)
And using “ggplot”, but without lineggplot(pain,aes(sample=diff))+stat_qq()
●
●●
● ●●
● ●
●
● ●
●
−12
−8
−4
0
−1 0 1theoretical
sam
ple
108 / 166
![Page 109: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/109.jpg)
And in SAS. . .proc univariate noprint;
qqplot diff;
109 / 166
![Page 110: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/110.jpg)
Getting a line on SAS normal quantile plot
I SAS doesn’t automatically provide a line, but even without one, yousee that these data are not normal because of the outlier bottom left.
I SAS can draw lines, but requires you to give a mean and SD to makethe line with.
I Simplest way is to have SAS estimate them from the data, but theline is usually not very good.
I Or we can estimate them another way from IQR.
110 / 166
![Page 111: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/111.jpg)
Having SAS estimate themproc univariate noprint;
qqplot diff / normal(mu=est sigma=est);
111 / 166
![Page 112: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/112.jpg)
Another way to estimate µ and σ
I Problem above is that SD was grossly inflated by outlier.
I On standard normal, quartiles about ±0.675:
qnorm(0.25); qnorm(0.75)
## [1] -0.6744898
## [1] 0.6744898
I So IQR of standard normal about 2(0.675) = 1.35.
I Thus IQR of any normal about 1.35σ.
I Idea: estimate σ by taking sample IQR and dividing by 1.35. Notaffected by outliers.
I Here, IQR is 1.95, so estimate of σ is 1.455.
I In similar spirit, estimate µ by median, −1.65.
112 / 166
![Page 113: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/113.jpg)
Using improved µ and σproc univariate noprint;
qqplot diff / normal(mu=-1.65 sigma=1.455);
113 / 166
![Page 114: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/114.jpg)
Dealing with non-normality
I One approach: do nothing, since the t-test takes care ofnon-normality.
I Another: make a sign test, and test whether median difference iszero. In SAS, just ask for it.
I Another: do a randomization test.
114 / 166
![Page 115: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/115.jpg)
Sign test in SAS
proc univariate;
var diff;
P-value 0.1460:
The UNIVARIATE Procedure
Variable: diff
Tests for Location: Mu0=0
Test -Statistic- -----p Value------
Student's t t -2.16771 Pr > |t| 0.0530
Sign M -3 Pr >= |M| 0.1460
Signed Rank S -32 Pr >= |S| 0.0088
115 / 166
![Page 116: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/116.jpg)
Look back at the data
I Data (differences drug A minus drug B):
diff
## [1] -1.5 -2.1 -0.3 0.2 -2.6 0.1 -1.8
## [8] 0.6 -1.5 -2.0 -2.3 -12.4
I See the big outlier (at end of list).
116 / 166
![Page 117: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/117.jpg)
Sign test in R
I In R, either use the sign.test function that we wrote before:
sign.test(0,diff)
## [1] 0.1459961
I Or, count the number of positive and negative differences:
table(diff>0)
##
## FALSE TRUE
## 9 3
I and calculate the P-value as
pbinom(3,12,0.5)*2
## [1] 0.1459961
117 / 166
![Page 118: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/118.jpg)
Randomization test idea
I Another way of doing a test when you’re worried about assumptionsis to use “randomization” idea.
I Ask “what could be switched around if H0 true”?
I In matched pairs, each of the matched observations could have comefrom either treatment (eg. drug).
I In two independent samples, each observation could have come fromeither sample.
I In ANOVA (multiple groups), each observation could have come fromany group.
I Randomization test: randomly shuffle treatments/samples/groupsaccording to experimental design.
118 / 166
![Page 119: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/119.jpg)
Randomization test for matched pairs
I Randomly allocate results for each person to drug A or B.
I Observed sample mean difference:
mean(diff)
## [1] -2.133333
I Switching drugs (from observed data) means switching sign ofdifference A minus B. So generate random sample of −1 and 1 of size12 (with replacement), like this:
pm=c(1,-1)
random.pm=sample(pm,12,replace=T)
random.pm
## [1] -1 1 1 -1 1 -1 1 1 -1 -1 -1 1
119 / 166
![Page 120: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/120.jpg)
Generating a randomization sample and mean
I Then take observed differences and generate randomized ones bymultiplying diff by these random signs:
random.diff=diff*random.pm
random.diff
## [1] 1.5 -2.1 -0.3 -0.2 -2.6 -0.1 -1.8
## [8] 0.6 1.5 2.0 2.3 -12.4
mean(random.diff)
## [1] -0.9666667
I This randomization gave a sample mean difference close to −1, whileour observed value was more negative, −2.13.
120 / 166
![Page 121: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/121.jpg)
One randomization sampleI Write a function to do randomization and return statistic, once.
rand.diff=function(x) {pm=c(-1,1)
random.pm=sample(pm,length(x),replace=T)
random.diff=diff*random.pm
return(mean(random.diff))
}
I Try it a few times:
rand.diff(diff)
## [1] -0.5833333
rand.diff(diff)
## [1] -1.083333
rand.diff(diff)
## [1] 0.7
rand.diff(diff)
## [1] 1.133333
I Each answer different, because of randomization. 121 / 166
![Page 122: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/122.jpg)
Repeating many times
I Function replicate repeats something as many times as you like.
I In this case, repeat rand.diff(diff) many times:
replicate(5,rand.diff(diff))
## [1] 0.2666667 -0.7833333 1.1000000 -0.7166667
## [5] 1.1000000
rand.mean=replicate(1000,rand.diff(diff))
I Collection of replicated values called randomization distribution.
I Use this instead of normal, t, . . . to get P-value from.
122 / 166
![Page 123: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/123.jpg)
Histogram of randomization distributionr=data.frame(mean=rand.mean)
ggplot(r,aes(x=mean))+geom_histogram(binwidth=0.5)
0
50
100
150
200
−3 −2 −1 0 1 2 3mean
coun
t
123 / 166
![Page 124: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/124.jpg)
The randomization distributionI If t-test were plausible, this would look normal. Does it?
I Why not? The outlier difference, −12.5. If counted as positive, meanprobably positive; if counted as negative, mean probably negative.
I This true almost regardless of other values (whether plus or minus).
I Randomization test can be trusted, though.
I P-value: how many of randomized means came out less thanobserved −2.13, doubled:
tab=table(rand.mean<=
mean(diff))
tab
##
## FALSE TRUE
## 994 6
2*tab[2]/1000
## TRUE
## 0.012
I Randomization P-value small; nodoubt now that mean pain relieffor two drugs different.
124 / 166
![Page 125: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/125.jpg)
The randomization distributionI If t-test were plausible, this would look normal. Does it?
I Why not? The outlier difference, −12.5. If counted as positive, meanprobably positive; if counted as negative, mean probably negative.
I This true almost regardless of other values (whether plus or minus).
I Randomization test can be trusted, though.
I P-value: how many of randomized means came out less thanobserved −2.13, doubled:
tab=table(rand.mean<=
mean(diff))
tab
##
## FALSE TRUE
## 994 6
2*tab[2]/1000
## TRUE
## 0.012
I Randomization P-value small; nodoubt now that mean pain relieffor two drugs different. 124 / 166
![Page 126: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/126.jpg)
Randomization test for median difference
I With outlier, should we be using mean?
I Randomization test flexible; change mean to median:
rand.diff.med=function(x) {pm=c(-1,1)
random.pm=sample(pm,length(x),replace=T)
random.diff=diff*random.pm
return(median(random.diff))
}replicate(5,rand.diff.med(diff))
## [1] 1.65 -0.15 0.15 0.35 0.05
rand.median=replicate(1000,rand.diff.med(diff))
I Change mean to median everywhere.
125 / 166
![Page 127: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/127.jpg)
Histogram of randomization distribution for medianr=data.frame(med=rand.median)
ggplot(r,aes(x=med))+
geom_histogram(binwidth=0.5)
0
100
200
300
400
−1 0 1med
coun
t
126 / 166
![Page 128: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/128.jpg)
P-value for randomization test for median
I Randomization distribution has only one peak now.
I Sample median difference was −1.65 (median(diff)).
I P-value is proportion of these at least as extreme:
tab=table(rand.median<=median(diff))
tab
##
## FALSE TRUE
## 989 11
2*tab[2]/1000
## TRUE
## 0.022
I Again, reject hypothesis that two drugs equally good.
127 / 166
![Page 129: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/129.jpg)
ComparisonTest results seem to be very different:
Test Statistic P-value
t-test mean 0.053randomization mean 0.012
sign test median 0.146randomization median 0.022
I Not that much agreement, though 2 randomization tests have similarresults.
I Sign test P-value much larger than others (lack of power?)
I t-test probably not trustworthy (very non-normal dist. of differences).
I Using mean probably not wise (outlier difference)
I Randomization test for median probably best.128 / 166
![Page 130: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/130.jpg)
The kids’ reading data, again
kids[1:15,]
## group score
## 1 t 24
## 2 t 61
## 3 t 59
## 4 t 46
## 5 t 43
## 6 t 44
## 7 t 52
## 8 t 43
## 9 t 58
## 10 t 67
## 11 t 62
## 12 t 57
## 13 t 71
## 14 t 49
## 15 t 54
kids[16:30,]
## group score
## 16 t 43
## 17 t 53
## 18 t 57
## 19 t 49
## 20 t 56
## 21 t 33
## 22 c 42
## 23 c 33
## 24 c 46
## 25 c 37
## 26 c 43
## 27 c 41
## 28 c 10
## 29 c 42
## 30 c 55
kids[31:44,]
## group score
## 31 c 19
## 32 c 17
## 33 c 55
## 34 c 26
## 35 c 54
## 36 c 60
## 37 c 28
## 38 c 62
## 39 c 20
## 40 c 53
## 41 c 48
## 42 c 37
## 43 c 85
## 44 c 42
I 21 kids in “treatment”, new reading method; 23 in “control”,standard reading method.
129 / 166
![Page 131: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/131.jpg)
Randomization test for two-sample data
I Each observation might, under H0, have come from either treatmentor control.
I Means for data:
aggregate(score~group,kids,mean)
## group score
## 1 c 41.52174
## 2 t 51.47619
Differ by 9.95.
I If we assign Treatment and Control to the observations at random,how likely would we be to see a difference in means of 9.95 or bigger?
130 / 166
![Page 132: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/132.jpg)
Randomization test
I Ingredients:
I sample to shuffle treatment groups
I aggregate to calculate means for shuffled groups
I little bit of calculation to get difference in sample means.
attach(kids)
myshuffle=sample(group)
myshuffle
## [1] c t t c t c c t c c c t t t c t t c c c t t t
## [24] c t t c c t t c c t c t t c t t c c c c c
## Levels: c t
I This makes shuffled groups.
131 / 166
![Page 133: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/133.jpg)
Randomization difference in means
I Now find group means for shuffled groups:
themeans=aggregate(score~myshuffle,data=kids,mean)
themeans
## myshuffle score
## 1 c 44.39130
## 2 t 48.33333
I Difference in means is second thing in second column minus firstthing in second column:
meandiff=themeans[2,2]-themeans[1,2]
meandiff
## [1] 3.942029
I Simulated treatment > simulated control.
132 / 166
![Page 134: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/134.jpg)
Making into function
I We will repeat shuffling process many times, so put it in function.
I Input: data frame.
I Output: difference in means of shuffled groups.
shuffle.means=function(x) {myshuffle=sample(x$group)
themeans=aggregate(score~myshuffle,data=x,mean)
meandiff=themeans[2,2]-themeans[1,2]
return(meandiff)
}
133 / 166
![Page 135: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/135.jpg)
Again, using function!shuffle.means(kids)
## [1] 0.8447205
shuffle.means(kids)
## [1] 4.124224
shuffle.means(kids)
## [1] 3.304348
replicate(5,shuffle.means(kids))
## [1] -7.2629400 1.0269151 7.3126294 -2.6169772
## [5] 0.4803313
I Each run does different shuffle, so gives different result.
I Last line does whole thing five times, collects results.
I Looks like a difference in means of 10 is improbably large, by chance.134 / 166
![Page 136: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/136.jpg)
1000 times
nsim=1000
ans=replicate(nsim,shuffle.means(kids))
I Investigate randomization by looking at ans.
135 / 166
![Page 137: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/137.jpg)
Histogram of randomized mean differencesggplot(data.frame(ans=ans),aes(x=ans))+
geom_histogram(binwidth=2)+geom_vline(xintercept=10)
0
50
100
150
−10 −5 0 5 10ans
coun
t
136 / 166
![Page 138: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/138.jpg)
How many randomizations gave mean difference > 10?I By chance, mean difference of 10 or bigger appears unlikely.
I Use a logical vector to pick out the elements of ans that we want,then make a table of them:
isbig=(ans>=10);
table(isbig)
## isbig
## FALSE TRUE
## 983 17
I 17 out of 1000.
I Our randomization test gives a P-value of 17/1000 = 0.017, so wouldreject null hypothesis “new reading program has no effect” in favourof alternative “new reading program helps”.
I P-value similar to 0.013 from two-sample t.
137 / 166
![Page 139: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/139.jpg)
Why 1000 randomizations?
I No uniquely good answer.
I More randomizations is better.
I But didn’t want to wait too long for it to finish!
I If you think 10,000 (or a million) better, replace nsim in my code bydesired value and run again.
138 / 166
![Page 140: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/140.jpg)
Jumping rats
I Recall jumping rats.
I See whether larger amount of exercise (jumping) went with higherbone density.
I 30 rats randomly assigned to one of 3 treatment groups:
I Control (no jumping)
I Low jumping
I High jumping
I Random assignment: rats in each group similar in all important ways.
I So entitled to draw conclusions about cause and effect.
I Recall boxplots, over:
139 / 166
![Page 141: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/141.jpg)
Boxplotsrats=read.table("jumping.txt",header=T)
ggplot(rats,aes(y=density,x=group))+geom_boxplot()
●
●
550
575
600
625
650
675
Control Highjump Lowjumpgroup
dens
ity
140 / 166
![Page 142: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/142.jpg)
Testing: ANOVA (comparing several means)
rats.aov=aov(density~group,data=rats)
summary(rats.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 7434 3717 7.978 0.0019 **
## Residuals 27 12579 466
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I Usual ANOVA table, small P-value: significant result.
I Null hypothesis: all 3 groups have same mean.
I Alternative: “not all 3 means the same”, at least one is different fromothers.
I Reject null, but not very useful finding.
141 / 166
![Page 143: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/143.jpg)
Same thing in SAS
Read in data and do ANOVA:
data rats;
infile '/folders/myfolders/jumping.txt' firstobs=2;
input group $ density;
proc anova;
class group;
model density=group;
142 / 166
![Page 144: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/144.jpg)
Results (some)
The ANOVA Procedure
Dependent Variable: density
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 7433.86667 3716.93333 7.98 0.0019
Error 27 12579.50000 465.90741
Corrected Total 29 20013.36667
Source DF Anova SS Mean Square F Value Pr > F
group 2 7433.866667 3716.933333 7.98 0.0019
143 / 166
![Page 145: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/145.jpg)
Which groups are different from which?
I ANOVA really only answers half our questions: it says “there aredifferences”, but doesn’t tell us which groups different.
I One possibility (not the best): compare all possible pairs of groups,via two-sample t.
I First pick out each group:
detach(kids)
controls=filter(rats,group=="Control")
lows=filter(rats,group=="Lowjump")
highs=filter(rats,group=="Highjump")
144 / 166
![Page 146: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/146.jpg)
Control vs. low
t.test(controls$density,lows$density)
##
## Welch Two Sample t-test
##
## data: controls$density and lows$density
## t = -1.0761, df = 16.191, p-value = 0.2977
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -33.83725 11.03725
## sample estimates:
## mean of x mean of y
## 601.1 612.5
No sig. difference here.
145 / 166
![Page 147: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/147.jpg)
Control vs. high
t.test(controls$density,highs$density)
##
## Welch Two Sample t-test
##
## data: controls$density and highs$density
## t = -3.7155, df = 14.831, p-value = 0.002109
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -59.19139 -16.00861
## sample estimates:
## mean of x mean of y
## 601.1 638.7
These are different.
146 / 166
![Page 148: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/148.jpg)
Low vs. high
t.test(lows$density,highs$density)
##
## Welch Two Sample t-test
##
## data: lows$density and highs$density
## t = -3.2523, df = 17.597, p-value = 0.004525
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -43.15242 -9.24758
## sample estimates:
## mean of x mean of y
## 612.5 638.7
These are different too.
147 / 166
![Page 149: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/149.jpg)
But. . .
I We just did 3 tests instead of 1.
I So we have given ourselves 3 chances to reject H0 : all means equal,instead of 1.
I Thus α for this combined test is not 0.05.
148 / 166
![Page 150: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/150.jpg)
John W. Tukey
I American statistician, 1915–2000I Big fan of exploratory data analysisI Invented boxplotI Invented “honestly significant
differences”I Invented jackknife estimationI Coined computing term “bit”I Co-inventor of Fast Fourier Transform
149 / 166
![Page 151: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/151.jpg)
Honestly Significant Differences
I Compare several groups with one test, telling you which groups differfrom which.
I Idea: if all population means equal, find distribution of highest samplemean minus lowest sample mean.
I Any means unusually different compared to that declared significantlydifferent.
150 / 166
![Page 152: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/152.jpg)
Tukey on rat data
rats.aov=aov(density~group,data=rats)
TukeyHSD(rats.aov)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = density ~ group, data = rats)
##
## $group
## diff lwr upr p adj
## Highjump-Control 37.6 13.66604 61.533957 0.0016388
## Lowjump-Control 11.4 -12.53396 35.333957 0.4744032
## Lowjump-Highjump -26.2 -50.13396 -2.266043 0.0297843
Again conclude that bone density for highjump group significantly higherthan for other two groups.
151 / 166
![Page 153: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/153.jpg)
Why Tukey’s procedure better than all t-tests
Look at P-values for the two tests:
Comparison Tukey t-tests
----------------------------------
Highjump-Control 0.0016 0.0021
Lowjump-Control 0.4744 0.2977
Lowjump-Highjump 0.0298 0.0045
I Tukey P-values (mostly) higher.
I Proper adjustment for doing three t-tests at once, not just one inisolation.
I lowjump-highjump comparison no longer significant at α = 0.01.
152 / 166
![Page 154: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/154.jpg)
Same stuff in SAS
proc anova;
class group;
model density=group;
means group / tukey;
153 / 166
![Page 155: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/155.jpg)
Tukey output (some)
The ANOVA Procedure
Tukey's Studentized Range (HSD) Test for density
Means with the same letter are not significantly different.
Tukey Grouping Mean N group
A 638.700 10 Highjump
B 612.500 10 Lowjump
B
B 601.100 10 Control
154 / 166
![Page 156: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/156.jpg)
Checking assumptionsggplot(rats,aes(y=density,x=group))+geom_boxplot()
●
●
550
575
600
625
650
675
Control Highjump Lowjumpgroup
dens
ity
Assumptions:
I Normally distributed data within each group
I with equal group SDs.155 / 166
![Page 157: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/157.jpg)
The assumptionsI Normally-distributed data within each group
I Equal group SDs.
These are shaky here because:
I control group has outliers
I highjump group appears to have less spread than others.
Possible remedies:
I Transformation of response (usually works best when SD increaseswith mean)
I Different test, eg. randomization test.
I Can also try Mood’s Median Test (if you know about chi-squaredtests). Based on medians, like a generalization of sign test. Notalways very powerful. See over.
156 / 166
![Page 158: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/158.jpg)
Mood’s median testI Find median of all bone densities, regardless of group:
attach(rats)
m=median(density) ; m
## [1] 621.5
I Count up how many observations in each group above or belowoverall median:
tab=table(group,density>m)
tab
##
## group FALSE TRUE
## Control 9 1
## Highjump 0 10
## Lowjump 6 4
I All Highjump obs above overallmedian.
I Most Control obs below overallmedian.
I Suggests medians differ by group.
157 / 166
![Page 159: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/159.jpg)
Mood’s median test 2/2
I Test whether association between group and being above/belowoverall median significant using chi-squared test for association:
chisq.test(tab)
##
## Pearson's Chi-squared test
##
## data: tab
## X-squared = 16.8, df = 2, p-value = 0.0002249
I Very small P-value says that being above/below overall mediandepends on group.
I That is, groups do not all have same median.
158 / 166
![Page 160: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/160.jpg)
Randomization test
I Like two-sample randomization test: randomly shuffle groupmemberships.
I Choices about test statistic, eg.
I usual ANOVA F -statistic
I Tukey-like highest sample mean minus lowest
I I go with highest minus lowest.
159 / 166
![Page 161: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/161.jpg)
Observed means and test statistic
I Calculate observed value of test statistic:
group.means=aggregate(density~group,data=rats,mean)
group.means
## group density
## 1 Control 601.1
## 2 Highjump 638.7
## 3 Lowjump 612.5
Actual means in second column of group.means:
z=group.means[,2]
obs=max(z)-min(z)
obs
## [1] 37.6
160 / 166
![Page 162: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/162.jpg)
Randomly shuffling groupsActual groups are:
rats$group
## [1] Control Control Control Control Control Control
## [7] Control Control Control Control Lowjump Lowjump
## [13] Lowjump Lowjump Lowjump Lowjump Lowjump Lowjump
## [19] Lowjump Lowjump Highjump Highjump Highjump Highjump
## [25] Highjump Highjump Highjump Highjump Highjump Highjump
## Levels: Control Highjump Lowjump
Shuffle like this:
shuffled.groups=sample(rats$group) ; shuffled.groups
## [1] Highjump Control Control Lowjump Lowjump Lowjump
## [7] Control Control Lowjump Lowjump Highjump Control
## [13] Control Highjump Highjump Highjump Lowjump Highjump
## [19] Lowjump Highjump Control Highjump Control Lowjump
## [25] Highjump Highjump Lowjump Control Lowjump Control
## Levels: Control Highjump Lowjump161 / 166
![Page 163: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/163.jpg)
Calculate means and highest minus lowestSame idea as for actual data, but using shuffled.groups instead ofgroup:
my.group.means=aggregate(density~shuffled.groups,
data=rats,mean)
my.group.means
## shuffled.groups density
## 1 Control 623.2
## 2 Highjump 613.0
## 3 Lowjump 616.1
z=my.group.means[,2]
max(z)-min(z)
## [1] 10.2
Need to do this a bunch of times. Idea: write a function to do it once,then run function many times.
162 / 166
![Page 164: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/164.jpg)
Function to do it once
I Input: original data.
I Obtain shuffled groups
I Calculate mean for each group
I Return largest mean minus smallest mean.
I Function:
shuffle1=function(mydata=rats) {shuffled.groups=sample(mydata$group)
means=aggregate(density~shuffled.groups,
data=mydata,mean)
z=means[,2]
max(z)-min(z)
}
163 / 166
![Page 165: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/165.jpg)
Obtaining randomization distribution
I Test (randomizes, so look for “sane” answer):
shuffle1(rats)
## [1] 19.4
I Use replicate to do a few times:
replicate(6,shuffle1(rats))
## [1] 9.4 12.8 25.0 15.8 6.6 1.9
I Or many times, like 1000:
test.stat=replicate(1000,shuffle1(rats))
164 / 166
![Page 166: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/166.jpg)
Randomization distribution of test.statvals=data.frame(ts=test.stat)
ggplot(vals,aes(x=ts))+geom_histogram(binwidth=5)+
geom_vline(xintercept=obs,colour="red")
0
50
100
150
200
250
0 10 20 30 40ts
coun
t
#hist(test.stat)
#abline(v=obs,col="red")
165 / 166
![Page 167: Statistical Inference - University of Torontobutler/c32/notes/inference.pdf · I Thus you can never be entirely certain about your conclusions. I The Toronto Blue Jays’ average](https://reader035.vdocuments.site/reader035/viewer/2022062605/5fc876abd836aa46c6393700/html5/thumbnails/167.jpg)
Comments and P-value
I Distribution shape skewed to right (lower limit of 0).
I Red line marks observed highest minus lowest: seems unusually high.
I If group means really different (anyhow), highest minus lowest will belarge, so one-tailed test.
I (Same logic as one-sidedness of F -test in ANOVA.)
I P-value the usual way:
table(test.stat>=obs)
##
## FALSE TRUE
## 998 2
I P-value 2/1000 = 0.0020. Despite outliers, still reject “all meansequal”, with P-value very similar to ANOVA (0.0019).
166 / 166