chapter 7 inference for distributions. inference for the mean of a population so far, we have...

45
Chapter 7 Inference for Distributions

Upload: roberta-waters

Post on 21-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Chapter 7

Inference for Distributions

Page 2: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Inference for the mean of a population So far, we have assumed that was known.

If is unknown, we can use the sample standard deviation (s), to estimate .

But this adds more variability to our test statistic and/or confidence interval (therefore, we will use the t-table).

If is known, then /n is known as standard deviation of x.

If is not known, then s/n is known as standard error of x.

When is not known, we use the t-table (Table D) instead of the Normal Table A (or say Z-table).

2)(1

1xx

ns i

Page 3: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

When n is very large, s is a very good estimate of and the corresponding t

distributions are very close to the normal distribution.

The t distributions become wider for smaller sample sizes, reflecting the lack of

precision in estimating from s.

The t-distribution

Need degree of freedom (say: df).In the “one sample problem with sample size n”, df = n-1.

As df increases, the t-distribution gets closer to the standard normal.

Table for t-distribution is Table D or use tcdf(start, end, df)

Page 4: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

How to find the p-value for t-distribution with TI83? pressing [2nd] [VARS]. tcdf(start, end, df), where df=degree-freedom Select [6:tcdf] Left-tailed test (H1: μ < some number) 1.Let our test statistics be –2.05 and n =16, so df = 15.

2. The p-value would be the area to the left of –2.05 or P(t < -2.05)

3. Notice the p-value is .0291, we would type in tcdf(-E99, -2.05,15) to get the same p-value.

Right-tailed test (H1: μ > some number) 1.Let our test statistics be 2.05 and n =16, so df = 15.

2. The p-value would be the area to the right of 2.05 or P(t >2.05)

3. Notice the p-value is .0291, we would type in tcdf(2.05, E99, 15) to get the same p-value.

Two-tailed test (H1: μ ≠ some number) 1.Let our test statistics be –2.05 and n =16, so df = 15.

2. The p-value would be double the area to the left of –2.05 or 2*P(t < -2.05)

3. Notice the p-value is .0582, we would type in 2*tcdf(-E99, -2.05,15) to get the same p-value.

Page 5: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Exact N( , ) Exact N( , )

Not Exact Normal, but with Approximately N( , )

Mean and SD

Distribution of X, (n=1): Sampling distribution of , (n>1) :

,

X

n

n

(By Central Limit Theorem)

Standardize: Z-score of Reverse: ; *X Xn

XZ

n

5

Review: Sampling distribution of a sample mean=distribution of X

Page 6: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Confidence intervals contain the population mean in C% of samples.

Different areas under the curve give different confidence levels C.

Example: For an 80% confidence level C, 80% of the normal curve’s

area is contained in the interval.

C

z*−z*

Review: Confidence levels when is known

z*: z* is related to the chosen

confidence level C.

C is the area under the standard

normal curve between −z* and z*.

nx z*)(

The confidence interval is thus:

Page 7: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Exact N( , ) If is unknown, then follows t( , )

with de

Distribution of X, (n=1): Sampling distribution of , (n>1) :

X

X

S

n

grees of freedom n - 1

Standardize: t-score of Reverse: ; *XS

Xn

XZ

S

n

Sampling distribution of a sample mean=distribution of

when is unknown

X

Page 8: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Confidence Interval when is unknown When is unknown, the confidence interval is given as

In order to find t*, we need to use Table D:Eg: find out t critical value with confidence level 95% and df=25.

( 1) ( 1)

* *( , )n n

S Sx t x t

n n

Key: t*=2.060;

C

Page 9: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Confidence Interval when is unknown When is unknown, the confidence interval is given as

In order to find t*, we need to use Table D:

1.find out t critical value with confidence level 90% and df 10.2.find out t critical value with confidence level 95% and df 15.3.find out t critical value with confidence level 99% and df 20.

( 1) ( 1)

* *( , )n n

S Sx t x t

n n

Key: 1.t*=1.812;2.t*=2.131;3.t*=2.845.

Page 10: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Table D

When σ is known, we use the normal distribution and the standardized z-value.

When σ is unknown,

we use a t distribution

with “n−1” degrees of

freedom (df).

Table D shows the

z-values and t-values

corresponding to

landmark P-values/

confidence levels.

t x s n

Page 11: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Example 1: Confidence intervals for Ex2: A random sample of 16 school-age girls were selected,

their average time per weekday spent on housework is 14

minutes with sample SD 8.6 minutes. Construct a 95% CI

for the average time spent on housework of school-age

girls in the nation.

EX2: (9.418, 18.582);

Page 12: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Example 2: Confidence intervals for Ex3: The average lifetime of 9 randomly selected certain

brand TVs is 20 years with sample SD 2 years.

Construct a 99% CI for the average lifetime of all TVs

from this brand.

EX3: (17.763, 22.237).

Page 13: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Example 3: Red wine, in moderation

Drinking red wine in moderation may protect against heart attacks. The

polyphenols it contains act on blood cholesterol and thus are a likely cause.

To see if moderate red wine consumption increases the average blood level of

polyphenols, a group of nine randomly selected healthy men were assigned to

drink half a bottle of red wine daily for two weeks. Their blood polyphenol levels

were assessed before and after the study, and the percent change is presented

here:

Q: What is the 95% confidence interval for the average percent change?

Firstly: Are the data approximately normal?

0.7 3.5 4 4.9 5.5 7 7.4 8.1 8.4

Histogram

0

1

2

3

4

2.5 5 7.5 9 More

Percentage change in polyphenol blood levels

Fre

quen

cy

There is a low

value, but overall

the data can be

considered

reasonably normal.0123456789

Perc

ent

change

-2 -1 0 1 2Normal quantiles

Page 14: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

What is the 95% confidence interval for the average percent change?

Sample average = 5.5; s = 2.517; df = n − 1 = 8

(…)

The sampling distribution is a t distribution with n − 1 degrees of freedom.

For df = 8 and C = 95%, t* = 2.306.

95% CI for average percent change is:

x + t*s/√n = 5.5 + 2.306*2.517/√9 =[ 3.565, 7.435 ].

Page 15: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

The one-sample t-test (5 steps)1. Stating H0 versus Ha.

2. Choosing a significance level

3. Calculating and df = n-1. (ASSUMING THE

NULL HYPOTHESIS IS TRUE)

4. Finding the P-value in direction of Ha: use

tcdf(start, end, df) for one-sided test or,

2*tcdf(start, end, df) for two-sided test.

5. Drawing conclusions: If P-value ≤ α, then we reject H0 (Enough evidence…).

If P-value > α, then we do not reject H0 (No Enough evidence...).

/

Xt

S n

Page 16: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

ns

xt 0

One-sided (one-tailed)

Two-sided (two-tailed)

The P-value is the probability, if H0 is true, of randomly drawing a

sample like the one obtained or more extreme, in the direction of Ha.

The P-value is calculated as the corresponding area under the curve,

one-tailed or two-tailed depending on Ha:

Page 17: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

(Chap6)-- The National Center for Health Statistics reports that the mean systolic blood pressure for males 35 to 44 years of age is 128 with a population SD=15. The medical director of a company looks at the medical records of 72 company executives in this age group and finds that the mean systolic blood pressure in this sample is 126.07. Is this evidence that executives blood pressures are lower than the national average?

(Chap7)-- The National Center for Health Statistics reports that the

mean systolic blood pressure for males 35 to 44 years of age is 128. A

simple random sample of 16 patients were tested, with average systolic

blood pressure 120 and sample SD 12. Is this evidence that executives

blood pressures are lower than the national average?

One-sample Test (Shall we use Z-test or T-test??)

Page 18: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Answer to Example in Chap7:

(1) Hypothesis: H0 : µ = 128 v.s. Ha : µ <128. (2) α = 5%

(3) One-sample t-Test statistics

(4) Draw the t(15) curve. Thus, P-value = tcdf(-999, -2.67, 15) = 0.0087

(5) (Statistical Conclusion) Since P-value < α, we reject H0.

(Non- Statistical Conclusion) That is, there is STRONG evidence that

executives blood pressures are lower than the national average.

t

151161,67.2

16

12128120

ndf

n

sx

t

Page 19: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

(Chap6)-- A new medicine treating cancer was introduced to the market

decades ago and the company claimed that on average it will prolong

a patient’s life for 5.2 years. Suppose the SD of all cancer patients is

2.52. In a 10 years study with 64 patients, the average prolonged

lifetime is 4.6 years. With normality assumption, do the 10-year

study’s data show a different average prolonged lifetime?

(Chap7)-- A new medicine treating cancer was introduced to the market

decades ago and the company claimed that on average it will prolong

a patient’s life for 5.2 years. In a 10 years study with 20 patients, the

average prolonged lifetime is 4.7 years with sample SD 2.50. With

normality assumption, do the 10-year study’s data show a different

average prolonged lifetime?

One-sample Test (Shall we use Z-test or T-test??)

Page 20: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Answer to Example in Chap7:

(1) Hypothesis: H0 : µ = 5.2 year versus Ha : µ ≠ 5.2 year. (2) α = 5%

(3) One-sample t-Test statistics

(4) Draw the t(19) curve. Thus, P-value = 2*tcdf(-999, -0.894) = 0.383.

(Statistical Conclusion) Since P-value > α, we do not reject H0.

(Non-Statistical Conclusion) There is NOT enough evidence to conclude that

10-year study’s data show a different average prolonged lifetime.

191,894.0

20

5.22.57.4

ndf

n

sx

t

Page 21: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Example 3: Hypothesis testing

For the following data set:

5 8 7 10 12 17 12

139 6 14 11 10

x = 10.308, s = 3.376

Q: Use Hypothesis Testing to test that the mean is significantly higher than 9.5.

Page 22: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Exercises on Hypothesis Testing (t-test)1. Because of variation in the manufacturing process, tennis balls produced

by a particular machine do not have identical diameters, which is supposed to be 3in. If the average diameters of the first 36 balls made from a machine is 3.2in with sample SD 0.15in, shall we stop and calibrate the machine?

2. A new medicine treating cancer was introduced to the market decades ago and the company claimed that on average it will prolong a patient’s life for 5 years. In a 10 years study with 81 patients, the average prolonged lifetime is 4.5 years with sample SD 0.4 years. With normality assumption, shall we reject the original claim?

3. The registrar office claims that the average SAT score of UNCW students is 1050. Suppose you randomly select 100 UNCW students the SAT score average of your sample is 1042 with sample SD 80. Do you agree with the claim?

4. National data shows that on the average, college freshmen spend 7.5 hours a week going to parties. One administrator takes a random sample of 81 freshmen from her college and finds out that her students’ average hours spent on parties is 7.6 with SD 2 hours. Shall the administrator believe that the national data applies to her students?

Page 23: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

1. H0 : µ = 3, Ha : µ ≠ 3; α = 5%;T=(3.2-3)/(.15/(36)^.5)=8; df=35; t-critical value=

2.04; P-value <5%;we reject H0 and we shall stop and calibrate the machine.

2. H0 : µ = 5, Ha : µ ≠ 5; α = 5%;T=(4.5-5)/(.4/(81)^.5)=-11.25; df=80; t-critical

value= 1.99; P-value <5%;we reject H0 and we shall reject the claim that the

average is 5 years.

3. H0 : µ = 1050,Ha :µ ≠ 1050; α = 5%;T=(1045-1050)/(80/(100)^.5)=-0.625;

df=99; t-critical value= 1.984; P-value >5%;we do not H0 and we do not need to stop

and calibrate the machine.

4. H0 : µ = 7.5, Ha : µ ≠ 7.5; α = 5%;Z=(7.6-7.5)/(2 /(81)^.5)=0.45; df=80; t-critical

value= 1.99; P-value>5%;we do not reject H0 and the national data does apply.

Answer:

Page 24: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Subjects are matched in “pairs” and

outcomes are compared within each unit

Example: Pre-test and post-test studies look at data

collected on the same sample elements before and after

some experiment is performed.

Example: Twin studies often try to sort out the influence of

genetic factors by comparing a variable between sets of

twins.

Matched pairs t proceduresfor dependent sample

We perform hypothesis testing on the difference in each unit

Page 25: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

The variable studied becomes Xdifference = (X1 − X2). The null

hypothesis of NO difference between the two paired

groups.

H0: µdifference= 0 ; Ha: µdifference>0 (or <0, or ≠0)

When stating the alternative, be careful how you are calculating the difference (after – before or before – after).

Conceptually, this is not different from tests on one

population.

Matched pairs

Page 26: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Matched Pairs If we take After – Before, and we want to show that the

“After group” has increased over the “Before group”

Ha: > 0 “After group” has decreased

Ha: < 0 The two groups are different

Ha: ≠0ns

xt

diff

diffdiffdiff

Page 27: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Example 4Many people believe that the moon influences the actions of some

individuals. A study of dementia patients in nursing homes recorded various types of disruptive behaviors every day for 12 weeks. Days were classified as moon days and other days. For each patient the average number of disruptive behaviors was computed for moon days and for other days. The data for 5 subjects whose behavior were classified as aggressive are presented as below:

Moon days Other days

3.33 0.27

3.67 0.59

2.67 0.32

3.33 0.19

3.33 1.26

We want to test whether there is any difference in aggressive behavior on moon days and other days.

Page 28: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Example 4Many people believe that the moon influences the actions of some

individuals. A study of dementia patients in nursing homes recorded various types of disruptive behaviors every day for 12 weeks. Days were classified as moon days and other days. For each patient the average number of disruptive behaviors was computed for moon days and for other days. The data for 5 subjects whose behavior were classified as aggressive are presented as below:

Moon days Other days Difference

3.33 0.27 3.06

3.67 0.59 3.08

2.67 0.32 2.35

3.33 0.19 3.14

3.33 1.26 2.07

We want to test whether there is any difference in aggressive behavior on moon days and other days.

Page 29: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Answer to Example 4

verses ,

t-statistic=12.377, df=5-1=4, p-value=2.449*10^(-4). Reject H0 at 5% level. Enough evidence to conclude that there is any difference

in aggressive behavior on moon days and other days

0 : 0dH : 0a dH 0.05

Let difference = aggressive behavior on moon days and other days.

Page 30: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Does lack of caffeine increase depression? (matched pair t-test)

Individuals diagnosed as caffeine-dependent are deprived of caffeine-rich

foods and assigned to receive daily pills. Sometimes, the pills contain

caffeine and other times they contain a placebo. Depression was

assessed.

Q: Does lack of caffeine increase depression?

SubjectDepression

with CaffeineDepression

with PlaceboPlacebo - Cafeine

1 5 16 112 5 23 183 4 5 14 3 7 45 8 14 66 5 24 197 0 6 68 0 3 39 2 15 1310 11 12 111 1 0 -1

There are 2 data points

for each subject, but

we’ll only look at the

difference. The sample

distribution appears

appropriate for a t-test.

Page 31: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Does lack of caffeine increase depression?

For each individual in the sample, we have calculated a difference in depression

score (placebo minus caffeine).

There were 11 “difference” points, thus df = n − 1 = 10.

We calculate that = 7.36; s = 6.92

H0: difference = 0 ; H0: difference > 0

53.311/92.6

36.70

ns

xt

SubjectDepression

with CaffeineDepression

with PlaceboPlacebo - Cafeine

1 5 16 112 5 23 183 4 5 14 3 7 45 8 14 66 5 24 197 0 6 68 0 3 39 2 15 1310 11 12 111 1 0 -1

For df = 10, p-value=0.0027.(1)Since p-value < 0.05, reject H0.

(2) We have enough evidence to conclude that:Caffeine deprivation causes a significant increase in depression.

x

Page 32: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Comparing two independent samples

We often compare two

treatments used on

independent samples.

Is the difference between both

treatments due to a true

difference in population means?

Independent samples: Subjects in one sample are

completely unrelated to subjects in the other sample.

Population 1

Sample 1

Population 2

Sample 2

Page 33: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Sec 7.2 Two independent samples t distributionWe have two independent SRSs (simple random samples) possibly coming from

two distinct populations with () and () unknown. We use ( 1,s1) and ( 2,s2) to

estimate () and (), respectively.

To compare the means, both populations should be normally distributed. However, in

practice, it is enough that the two distributions have similar shapes and that the

sample data contain no strong outliers.

x

x

Page 34: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

SE s1

2

n1

s2

2

n2

s12

n1

s2

2

n2

df

1-2

x 1 x 2

The two-sample t statistic follows approximately the t distribution with a

standard error SE (spread) reflecting

variation from both samples:

Conservatively, the degrees

of freedom is equal to the

smallest of (n1 − 1, n2 − 1).

Page 35: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

t (x 1 x 2) (1 2)

SE

Two-sample t-test

The null hypothesis is that both population means

and are equal, thus their difference is equal to zero.

H0: = −

with either a one-sided or a two-sided alternative hypothesis.

We find how many standard errors (SE) away

from ( − ) is ( 1− 2) by standardizing:

Because in a two-sample test H0

assumes ( − 0, we simply use

With df = smallest(n1 − 1, n2 − 1)

t x 1 x 2s1

2

n1

s22

n2

x

x

Page 36: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Does smoking damage the lungs of children exposed

to parental smoking?

Forced vital capacity (FVC) is the volume (in milliliters) of

air that an individual can exhale in 6 seconds.

FVC was obtained for a sample of children not exposed to

parental smoking and a group of children exposed to

parental smoking.

We want to know whether parental smoking decreases

children’s lung capacity as measured by the FVC test.

Is the mean FVC lower in the population of children

exposed to parental smoking?

Parental smoking FVC s n

Yes 75.5 9.3 30

No 88.2 15.1 30

x

Page 37: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Parental smoking FVC s n

Yes 75.5 9.3 30

No 88.2 15.1 30

The difference in sample averages

follows approximately the t distribution

with 29 df:

We calculate the t statistic:

9.3 6.79.2

7.12

301.15

303.9

2.885.752222

t

ns

ns

xxt

no

no

smoke

smoke

nosmoke

p-value=tcdf(-E99, -3.919, 29)=2.491*10^(-4),

So p-value < 5%. It’s a very significant

difference, we reject H0.

H0: smoke = no <=> (smoke − no) = 0

Ha: smoke < no <=> (smoke − no) < 0 (one sided)

Therefore, we have enough evidence to conclude that lung capacity is

significantly impaired in children of smoking parents.

x

Page 38: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Example 1.

A clinical dietician wants to compare two different diets, A and B, for diabetic patients. She gets a random sample of 60 diabetic patients and randomly assign them into two equal sized groups. At the end of the experiment, a blood glucose test is conducted on each patient.

The average difference in blood glucose measure from group A is 100 mg/dl with sample SD 10, and the average difference in blood glucose measure from group B is 106 mg/dl with sample SD 12.

Q: Does this indicate that diet B has higher blood glucose than diet A?

Two-sample t-test

Page 39: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

1. H0 : µA = µB, Ha : µA < µB; α = 5%;

T=(100-106)/(10^2/30+12^2/30)^.5=-2.104; df=29;

p-value=tcdf(-E99, -2.104, 29)=0.022; P-value <5%;

Therefore we reject H0, and we have enough evidence to

conclude that diet B has higher blood glucose than A.

Two-sample t-test

Page 40: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Example 2.

An experiment is conducted to determine whether intensive tutoring (covering a great deal of material in a fixed amount of time) is more effective than paced tutoring (covering less material in the same amount of time). Two randomly chosen groups are tutored separately and then administered proficiency tests.

The sample size of the intensive group is 10 with sample average 76 and sample SD 6; The sample size of the paced group is 12 with sample average 70 and sample SD 8.

Q: May we conclude that the intensive group is doing better?

Two-sample t-test

Page 41: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

2. H0 : µint = µpac, Ha : µint > µpac; α = 5%;

T=(76-70)/(6^2/10+8^2/12)^.5=2.007; df=min(9, 11)=9;

p-value=tcdf(2.007, E99, 9)=0.038; P-value <5%;

Therefore we reject H0 and we have enough evidence to

conclude that intensive group is better.

Two-sample t-test

Page 42: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

Two sample t-confidence interval The general form of the confidence interval for

the population difference :

We find t* from Table D with

df = smallest (n1−1; n2−1).

2 21 2

1 21 2

( ) *s s

x x tn n

Page 43: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

EX 7.14: Can directed reading activities in the classroom help

improve reading ability? A class of 21 third-graders participates

in these activities for 8 weeks while a control classroom of 23

third-graders follows the same curriculum without the activities.

After 8 weeks, all children take a reading test (scores in table).

Q: Find the 95% confidence interval for (µ1 − µ2).

Page 44: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

EX 7.14: Can directed reading activities in the classroom help

improve reading ability? A class of 21 third-graders participates

in these activities for 8 weeks while a control classroom of 23

third-graders follows the same curriculum without the activities.

After 8 weeks, all children take a reading test (scores in table).

95% confidence interval for (µ1 − µ2), with df = 20 conservatively t* = 2.086:

With 95% confidence, (µ1 − µ2), falls within 9.96 ± 8.987 or (0.973, 18.947).

Page 45: Chapter 7 Inference for Distributions. Inference for the mean of a population  So far, we have assumed that  was known.  If  is unknown, we can use

1. The average lifetime of 36 randomly selected TVs from brand A is 20 years with sample SD 2 years. The average lifetime of 25 randomly selected TVs from brand B is 18 years with sample SD 4 years. Construct a 95% CI for the difference of the average lifetimes between brand A and brand B.

2. In a clinical study, a new medicine is used in the treatment group with 64 patients. The new medicine can on average prolong 4 years of life with sample SD 0.75. As a comparison, the placebo group with 60 patients has an average prolonged life of 3 years with sample SD 1.2 years. Construct a 90% CI for the difference of the average lifetimes prolonged between the treatment group and the placebo group.

Two sample t-confidence interval