the t distribution - university of torontofisher.utstat.toronto.edu/~hadas/stab22/lecture...

week11 1

The t distribution• Suppose that a SRS of size n is drawn from a N(μ, σ)

population. Then the one sample t statistic

has a t distribution with n -1 degrees of freedom.

• The t distribution has mean 0 and it is a symmetric distribution.

• The is a different t distribution for each sample size.• A particular t distribution is specified by the degrees of

freedom that comes from the sample standard deviation.

nsxt μ−

=

week11 2

Tests for the population mean μ when σ is unknown

• Suppose that a SRS of size n is drawn from a population having unknown mean μ and unknown stdev. σ. To test the hypothesis H0: μ = μ0 , we first estimate σ by s – the sample stdev., then compute the one-sample t statistic given by

• In terms of a random variable T having the t (n - 1) distribution, the P-value for the test of H0 against

Ha : μ > μ 0 is P( T ≥ t )Ha : μ < μ 0 is P( T ≤ t )

Ha : μ ≠ μ 0 is 2·P( T ≥ |t|)

nsxt 0μ−=

week11 3

Example• In a metropolitan area, the concentration of cadmium (Cd) in

leaf lettuce was measured in 6 representative gardens where sewage sludge was used as fertilizer. The following measurements (in mg/kg of dry weight) were obtained.Cd 21 38 12 15 14 8Is there strong evidence that the mean concentration of Cd is higher than 12.

Descriptive Statistics

Variable N Mean Median TrMean StDev SE MeanCd 6 18.00 14.50 18.00 10.68 4.36

• The hypothesis to be tested are: H0: μ = 12 vs Ha: μ > 12.

week11 4

• The test statistics is:

The degrees of freedom are df = 6 – 1 = 5Since t = 1.38 < 2.015, we cannot reject H0 at the 5% level and so there are no strong evidence.The P-value is 0.1 < P(T(5) ≥ 1.38) < 0.15 and so is greater then 0.05 indicating a non significant result.

18 12 1.38/ 10.68/ 6xt s n

μ− −= = =

week11 5

CIs for the population mean μ when σunknown

• Suppose that a SRS of size n is drawn from a population having unknown mean μ. A C-level CI for μ when σ is unknown is an interval of the form

where t* is the value for the t (n -1) density curve with area C between –t* and t*.

• Example: Give a 95% CI for the mean Cd concentration.

⎟⎠

⎞⎜⎝

⎛ ⋅+⋅−n

stxn

stx ** ,

week11 6

• MINITAB commands: Stat > Basic Statistics > 1-Sample t• MINITAB outputs for the above problem:

T-Test of the Mean

Test of mu = 12.00 vs mu > 12.00

Variable N Mean StDev SE Mean T PCd 6 18.00 10.68 4.36 1.38 0.11

T Confidence Intervals

Variable N Mean StDev SE Mean 95.0 % CICd 6 18.00 10.68 4.36 (6.79, 29.21)

week11 7

Question 3 Final exam Dec 2000

• In order to test H0: μ = 60 vs Ha: μ ≠ 60 a random sample of 9 observations (normally distributed) is obtained, yielding and s = 5. What is the p-value of the test for this sample?

a) greater than 0.10.b) between 0.05 and 0.10.c) between 0.025 and 0.05.d) between 0.01 and 0.025.e) less than 0.01.

55x=

week11 8

QuestionA manufacturing company claims that its new floodlight willlast 1000 hours. After collecting a simple random sample of size ten, you determine that a 95% confidence interval for the true mean number of hours that the floodlights will last, μ, is (970, 995). Which of the following are true? (Assume all tests are two-sided.)

I) At any α < .05, we can reject the null hypothesis that the true mean is 1000.

II) If a 99% confidence interval for the mean were determined here, the numerical value 972 would certainly lie in this interval.

III) If we wished to test the null hypothesis H0: μ = 988, we could say that the p-value must be < 0.05.

week11 9

Questions1. Alpha (level of sig. α) is a) the probability of rejecting H0 when H0 is true.b) the probability of supporting H0 when H0 is false.c) supporting H0 when H0 is true.d) rejecting H0 when H0 is false.

2. Confidence intervals can be used to do hypothesis tests for a) left tail tests.b) right tail testsc) two tailed test

3. The Type II error is supporting a null hypothesis that is false. T/F

week11 10

Robustness of the t procedures

• Robust procedures A statistical inference procedure is called robust if the probability calculations required are insensitive to violations of the assumptions made.

• t-procedures are quite robust against nonnormality of the population except in the case of outliers or strong skewness.

week11 11

Simulation study• Let’s generate 100 samples of size 10 from a moderately

skewed distribution (Chi-square distribution with 5 df ) and calculate the 95% t-intervals to see how many of them contain the true mean μ = 5.

• First let’s have a look at the histogram of the 1000 values generated from this distribution.

Variable N Mean Median TrMean StDev

C1 1000 4.9758 4.2788 4.7329 3.1618

3 02 01 00

4 0 0

3 0 0

2 0 0

1 0 0

0

C 1

Freq

uenc

y

week11 12

T Confidence Intervals

Variable N Mean StDev SE Mean 95.0 % CIC1 10 5.21 3.89 1.23 ( 2.43, 7.99). . .C4 10 4.449 1.593 0.504 ( 3.309, 5.589)C5 10 5.33 4.23 1.34 ( 2.31, 8.36)C6 10 3.267 2.312 0.731 ( 1.612, 4.921)*C7 10 4.981 2.988 0.945 ( 2.844, 7.118)C8 10 3.725 1.520 0.481 ( 2.638, 4.812)*C9 10 4.487 2.332 0.738 ( 2.819, 6.155). . .

C14 10 4.650 1.854 0.586 ( 3.324, 5.977)C15 10 2.973 2.163 0.684 ( 1.425, 4.520)*C16 10 4.685 2.254 0.713 ( 3.072, 6.297)C26 10 5.594 2.984 0.944 ( 3.459, 7.728)C27 10 3.468 2.078 0.657 ( 1.982, 4.955)*C28 10 5.59 3.84 1.22 ( 2.84, 8.34). . .

C62 10 5.689 3.113 0.984 ( 3.462, 7.916)C63 10 3.724 1.741 0.551 ( 2.479, 4.970)*C64 10 4.387 2.157 0.682 ( 2.843, 5.930). . .

C87 10 7.01 3.44 1.09 ( 4.55, 9.47)C88 10 3.281 2.265 0.716 ( 1.661, 4.902)*C89 10 4.78 3.20 1.01 ( 2.49, 7.06). . .

C99 10 6.52 4.24 1.34 ( 3.49, 9.56)C100 10 3.614 2.198 0.695 ( 2.042, 5.186)

The number of intervals not capturing the true mean (μ = 5) is 6/100.

week11 13

Example• 100 samples of size 15 were drawn from a very skewed

distribution (Chi-square distribution with d. f. 1)Variable N Mean Median TrMean StDev

C1 1500 0.9947 0.4766 0.8059 1.3647

• The 95% CIs (t-intervals) for these 100 samples are given below.

151050

1500

1000

500

0

C1

Freq

uenc

y

week11 14

T Confidence IntervalsVariable N Mean StDev SE Mean 95.0 % CIC1 15 0.773 0.939 0.242 ( 0.253, 1.293)C2 15 1.093 1.491 0.385 ( 0.268, 1.919)C3 15 0.553 0.735 0.190 ( 0.146, 0.960)*C4 15 0.387 0.732 0.189 ( -0.019, 0.792)*C5 15 1.239 2.146 0.554 ( 0.051, 2.427)...C23 15 0.491 0.619 0.160 ( 0.148, 0.834)*C24 15 0.582 1.088 0.281 ( -0.020, 1.184)C25 15 0.550 0.660 0.170 ( 0.184, 0.915)*C26 15 0.634 0.769 0.199 ( 0.208, 1.060)C27 15 0.508 0.528 0.136 ( 0.216, 0.800)*... C51 15 1.122 1.292 0.334 ( 0.406, 1.837)C52 15 0.519 0.664 0.171 ( 0.151, 0.887)*C53 15 1.666 2.028 0.524 ( 0.543, 2.789)... C59 15 1.208 2.297 0.593 ( -0.065, 2.480)C60 15 0.644 0.525 0.136 ( 0.353, 0.935)*C61 15 1.088 1.122 0.290 ( 0.466, 1.709)

week11 15

T Confidence Intervals (continuation)... C79 15 0.895 0.931 0.240 ( 0.379, 1.411)

C80 15 0.391 0.767 0.198 ( -0.034, 0.816)*C81 15 1.038 0.992 0.256 ( 0.488, 1.587)C82 15 0.952 1.407 0.363 ( 0.173, 1.732)

C83 15 0.2763 0.2999 0.0774 ( 0.1102, 0.4424)*C84 15 1.237 1.999 0.516 ( 0.130, 2.345)

... C99 15 0.921 0.865 0.223 ( 0.442, 1.400)C100 15 0.813 1.437 0.371 ( 0.018, 1.609)

The number of intervals not capturing the true mean (μ = 1) is 9/100.

week11 16

Match Pairs t-test• In a matched pairs study, subjects are matched in pairs and the

outcomes are compared within each matched pair. The experimenter can toss a coin to assign two treatment to the two subjects in each pair. Matched pairs are also common when randomization is not possible. One situation calling for match pairs is when observations are taken on the same subjects, under different conditions.

• A match pairs analysis is needed when there are two measurements or observations on each individual and we want to examine the difference.

• For each individual (pair), we find the difference d between the measurements from that pair. Then we treat the di as one sample and use the one sample t – statistic to test for no difference between the treatments effect.

• Example: similar to exercise 7.41 on page 446 in IPS.

week11 17

Data DisplayRow Student Pretest Posttest improvement

1 1 30 29 -12 2 28 30 23 3 31 32 14 4 26 30 45 5 20 16 -46 6 30 25 -57 7 34 31 -38 8 15 18 39 9 28 33 5

10 10 20 25 511 11 30 32 212 12 29 28 -113 13 31 34 314 14 29 32 315 15 34 32 -216 16 20 27 717 17 26 28 218 18 25 29 419 19 31 32 120 20 29 32 3

week11 18

• One sample t-test for the improvementT-Test of the Mean

Test of mu = 0.000 vs mu > 0.000Variable N Mean StDev SE Mean T Pimprovem 20 1.450 3.203 0.716 2.02 0.029

• MINITAB commands for the paired t-testStat > Basic Statistics > Paired t

Paired T-Test and Confidence Interval

Paired T for Posttest – Pretest N Mean StDev SE Mean

Posttest 20 28.75 4.74 1.06Pretest 20 27.30 5.04 1.13Difference 20 1.450 3.203 0.71695% CI for mean difference: (-0.049, 2.949)T-Test of mean difference=0 (vs > 0): T-Value = 2.02 P-Value = 0.029

week11 19

Character Stem-and-Leaf Display

Stem-and-leaf of improvement N = 20Leaf Unit = 1.0

2 -0 544 -0 326 -0 118 0 11

(7) 0 22233335 0 44551 0 7

86420-2-4

6

5

4

3

2

1

0

improvement

Freq

uenc

y

week11 20

Two-sample problems• The goal of inference is to compare the response in two groups.• Each group is considered to be a sample form a distinct

population.• The responses in each group are independent of those in the

other group.• A two-sample problem can arise form a randomized

comparative experiment or comparing random samples separately selected from two populations.

• Example: A medical researcher is interested in the effect of added calcium in our diet on blood pressure. She conducted a randomized comparative experiment in which one group of subjects receive a calcium supplement and a control group gets a placebo.

week11 21

Comparing two means (with two independent samples)• Here we will look at the problem of comparing two population means

when the population variances are known or the sample sizes are large. Suppose that a SRS of size n1 is drawn from an N( μ1, σ1) population and that an independent SRS of size n2 is drown from an N( μ2, σ2) population. Then the two-sample z statistics for testing the null hypothesis H0: μ1 = μ2 is given by

and has the standard normal N(0,1) sampling distribution.

• Using the standard normal tables, the P-value for the test of H0 againstHa : μ1 > μ2 is P( Z ≥ z )Ha : μ1 < μ2 is P( Z ≤ z ) Ha : μ1 ≠ μ 2 is 2·P(Z ≥ |z|)

( ) ( )( ) ( )2

221

21

2121

nnxxz

σσμμ

+

−−−=

week11 22

Example• A regional IRS auditor runs a test on a sample of returns filed

by March 15 to determine whether the average return this year is larger than last year. The sample data are shown here for a random sample of returns from each year.

• Assume that the std. deviation of returns is known to be about 100 for both years. Test whether the average return is larger this year than last year.

Last Year This YearMean 380 410

Sample size 100 120

week11 23

Solution• The hypothesis to be tested are: H0: μ1 = μ2 vs Ha: μ1 < μ2.• The test statistics is:

• The P-value = P(Z < -2.22) = 0.0139 < 0.05, therefore we can reject H0 and conclude that at the 5% significant level, the average return is larger this year than last year.

• A 95% CI for the difference is given by:

,

380 410 0 2.22 1.6452 2100 100

100 120

z − −= = − < −+

2 2* 1 2

1 2 1 2x x Z n n

σ σ⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠

− ± +2 2100 10030 1.96 30 26.5100 120

(3.5, 56.5)± + = ±

=

week11 24

Comparing two population means (unknown std. deviations)

• Suppose that a SRS of size n1 is drawn from a normal population with unknown mean μ1 and that an independent SRS of size n2 is drawn from another normal population with unknown mean μ2. To test the null hypothesis H0: μ1 = μ2, we compute the two sample t-statistic

• This statistic has a t-distribution with df approximately equal to smaller of n1 – 1 and n2 - 1. We can use this distribution to compute the P-value.

( )( ) ( )2

221

21

2121

nsnsxxt

+

−−−=

μμ

week11 25

Example• The weight gains for n1 = n2 = 8 rats tested on diets 1 and 2 are

summarized here. Test whether diet 2 has greater mean weight gain. Use the 5% significant level.

• The hypotheses to be tested are: H0: μ1 = μ2 vs Ha: μ1 < μ2 .• The test statistic is

Diet 1 Diet 2n 8 8Std dev. .033 0.070mean 3.1 3.2

3.1 3.2 0 3.652 20.033 0.070

8 8

t − −= =−+

week11 26

• The P-value is P(T(7) ≤- 3.65) = P(T(7) ≥ 3.65) , from table D we have 0.005 < P-value < 0.01 and so we reject H0 and conclude that the mean weight gain from diet 2 is significantly greater than that from diet 1 (at the 5% and 1% significant level).

• A C% CI for the difference between the two means is given by,

• For this example the 95% CI is

= (0.0353, 0.165)

2

22

1

21

21 ns

nstxx +±− ∗

( )8

070.08

033.0365.21.32.322

+±−

the t distribution - university of torontofisher.utstat.toronto.edu/~hadas/stab22/lecture...

Documents