tests of hypotheses based on a single sample

8Tests of Hypotheses

Based on a Single Sample

8.1-8.2 Hypotheses Tests About a Population Mean

Overview of Inference

Methods for drawing conclusions about a population from sample data are called statistical inference

Methods Confidence Intervals - estimating a value of a population parameter Tests of hypotheses - assess evidence for a claim about a population

Inference is appropriate when data are produced by either a random sample or a randomized experiment

Stating hypotheses

A test of hypothesis tests a specific hypothesis using sample data to

decide on the validity of the hypothesis.

In statistics, a hypothesis is an assumption or a theory about the

characteristics of one or more variables in one or more populations.

What you want to know: Does the calibrating machine that sorts cherry

tomatoes into packs need revision?

The same question reframed statistically: Is the population mean µ for the

distribution of weights of cherry tomato packages equal to 227 g (i.e., half

a pound)?

The null hypothesis is a very specific statement about a parameter of

the population(s). It is labeled H0.

The alternative hypothesis is a more general statement about a

parameter of the population(s) that is exclusive of the null hypothesis. It

is labeled Ha.

Weight of cherry tomato packs:

H0 : µ = 227 g (µ is the average weight of the population of packs)

Ha : µ ≠ 227 g (µ is either larger or smaller)

One-sided and two-sided tests A two-tail or two-sided test of the population mean has these null

and alternative hypotheses:

H0 : µ = [a specific number] Ha : µ [a specific number]

A one-tail or one-sided test of a population mean has these null and

alternative hypotheses:

H0 : µ = [a specific number] Ha : µ < [a specific number] OR

H0 : µ = [a specific number] Ha : µ > [a specific number]

The FDA tests whether a generic drug has an absorption extent similar to the

known absorption extent of the brand-name drug it is copying. Higher or lower

absorption would both be problematic, thus we test:

H0 : µgeneric = µbrand Ha : µgeneric µbrand two-sided

Test Statistic

A test of significance is based on a statistic that estimates the parameter that appears in the hypotheses. When H0 is true, we expect the estimate to take a value near the parameter value specified in H0.

Values of the estimate far from the parameter value specified by H0 give evidence against H0.

A test statistic calculated from the sample data measures how far the data diverge from what we would expect if the null hypothesis H0 were true.

Large values of the statistic show that the data are not consistent with H0.

A test statistic calculated from the sample data measures how far the data diverge from what we would expect if the null hypothesis H0 were true.

Large values of the statistic show that the data are not consistent with H0.

z estimate - hypothesized value

standard deviation of the estimate

P-Value

The null hypothesis H0 states the claim that we are seeking evidence against. The probability that measures the strength of the evidence against a null hypothesis is called a P-value.

The probability, computed assuming H0 is true, that the statistic would take a value as extreme as or more extreme than the one actually observed is called the P-value of the test. The smaller the P-value, the stronger the evidence against H0 provided by the data.

Small P-values are evidence against H0 because they say that the observed result is unlikely to occur when H0 is true.

Large P-values fail to give convincing evidence against H0 because they say that the observed result is likely to occur by chance when H0 is true.

Statistical SignificanceThe final step in performing a significance test is to draw a conclusion about the competing claims you were testing. We will make one of two decisions based on the strength of the evidence against the null hypothesis (and in favor of the alternative hypothesis)―reject H0 or fail to reject H0.

If our sample result is too unlikely to have happened by chance assuming H0 is true, then we’ll reject H0.

Otherwise, we will fail to reject H0.

Note: A fail-to-reject H0 decision in a significance test doesn’t mean that H0 is true. For that reason, you should never “accept H0” or use language implying that you believe H0 is true.

In a nutshell, our conclusion in a significance test comes down to:

P-value small → reject H0 → conclude Ha (in context)

P-value large → fail to reject H0 → cannot conclude Ha (in context)

There is no rule for how small a P-value we should require in order to reject H0 — it’s a matter of judgment and depends on the specific circumstances. But we can compare the P-value with a fixed value that we regard as decisive, called the significance level. We write it as , the Greek letter alpha. When our P-value is less than the chosen , we say that the result is statistically significant.

If the P-value is smaller than alpha, we say that the data are statistically significant at level . The quantity is called the significance level or the level of significance.

If the P-value is smaller than alpha, we say that the data are statistically significant at level . The quantity is called the significance level or the level of significance.

When we use a fixed level of significance to draw a conclusion in a significance test,

P-value < → reject H0 → conclude Ha (in context)

P-value ≥ → fail to reject H0 → cannot conclude Ha (in context)

Statistical Significance

Tests for a Population Mean

Four Steps of Tests of Significance

1. State the null and alternative hypotheses.

2. Calculate the value of the test statistic.

3. Find the P-value for the observed data.

4. State a conclusion.

1. State the null and alternative hypotheses.

2. Calculate the value of the test statistic.

3. Find the P-value for the observed data.

4. State a conclusion.

Tests of Significance: Four StepsTests of Significance: Four Steps

We will learn the details of many tests of significance in the following chapters. The proper test statistic is determined by the hypotheses and the data collection design.

Does the packaging machine need revision?

H0 : µ = 227 g versus Ha : µ ≠ 227 g

What is the probability of drawing a random sample such

as yours if H0 is true?

245

227222

n

xz

From table A, the area under the standard

normal curve to the left of z is 0.0228.

Thus, P-value = 2*0.0228 = 4.56%.

4g5 g222 nx

2.28%2.28%

217 222 227 232 237

Sampling distribution

σ/√n = 2.5 g

µ (H0)2

,z

x

The probability of getting a random

sample average so different from

µ is so low that we reject H0.

The machine does need recalibration.

The significance level:

The significance level, α, is the largest P-value tolerated for rejecting a

true null hypothesis (how much evidence against H0 we require). This

value is decided arbitrarily before conducting the test.

If the P-value is equal to or less than α (P ≤ α), then we reject H0.

If the P-value is greater than α (P > α), then we fail to reject H0.

Does the packaging machine need revision?

Two-sided test. The P-value is 4.56%.

* If α had been set to 5%, then the P-value would be significant.

* If α had been set to 1%, then the P-value would not be significant.

Two-Sided Significance Tests and Confidence IntervalsBecause a two-sided test is symmetrical, you can also use a 1 –

confidence interval to test a two-sided hypothesis at level .

α /2 α /2

In a two-sided test,

C = 1 –

C confidence level

significance level

Sweetening colas

Cola manufacturers want to test how much the sweetness of a new cola drink is affected by storage. The sweetness loss due to storage was evaluated by 10 professional tasters (by comparing the sweetness before and after storage):

Taster Sweetness loss 1 2.0 2 0.4 3 0.7 4 2.0 5 −0.4 6 2.2 7 −1.3 8 1.2 9 1.1 10 2.3

Obviously, we want to test if storage results in a loss of sweetness, thus:

H0: = 0 versus Ha: > 0

This looks familiar. However, here we do not know the population parameter . The population of all cola drinkers is too large. Since this is a new cola recipe, we have no population data.

This situation is very common with real data.

When is unknown

When the sample size is large,

the sample is likely to contain

elements representative of the

whole population. Then s is a

good estimate of .

Populationdistribution

Small sampleLarge sample

But when the sample size is

small, the sample contains only

a few individuals. Then s is a

mediocre estimate of .

The sample standard deviation s provides an estimate of the population standard

deviation .

Standard deviation s – standard error s/√n

For a sample of size n, the sample standard deviation s is:

The value s/√n is called the standard error of the mean .

2)(1

1xx

ns i

x

The t distributions

Suppose that an SRS of size n is drawn from an N(µ, σ) population.

When is known, the sampling distribution is N(/√n).

When is estimated from the sample standard deviation s, the

sampling distribution follows a t distribution t(, s/√n) with degrees

of freedom n − 1.

is the one-sample t statistic.

t x s n

When n is very large, s is a very good estimate of , and the

corresponding t distributions are very close to the normal distribution.

The t distributions become wider for smaller sample sizes, reflecting

the lack of precision in estimating from s.

Standardizing the data before using t-table

t

t x s n

As with the normal distribution, the first step is to standardize the data.

Then we can use t-table to obtain the area under the curve.

s/√n

t(,s/√n)df = n − 1

t()df = n − 1

x 0

1

T-table

When σ is known, we use the normal distribution and the standardized z-value.

When σ is unknown,

we use a t distribution

with “n−1” degrees of

freedom (df).

Table shows the

z-values and t-values

corresponding to

landmark P-values/

confidence levels.

t x s n

z-table vs. t-table

Z-table gives the area to the LEFT of hundreds of z-values.

It should only be used for Normal distributions.

(…)

(…)

t-table gives the area to the RIGHT of a dozen t or z-values.

It can be used for t distributions of a given df and for the Normal distribution.

T-table also gives the middle area under a t or normal distribution comprised between the negative and positive value of t or z.

Table D

ns

xt 0

One-sided (one-tailed)

Two-sided (two-tailed)

The P-value is the probability, if H0 is true, of randomly drawing a

sample like the one obtained or more extreme, in the direction of Ha.

The P-value is calculated as the corresponding area under the curve,

one-tailed or two-tailed depending on Ha:

T-table

For df = 9 we only look into the corresponding row.

For a one-sided Ha, this is the P-value (between 0.01 and 0.02);

for a two-sided Ha, the P-value is doubled (between 0.02 and 0.04).

2.398 < t = 2.7 < 2.821thus

0.02 > upper tail p > 0.01

The calculated value of t is 2.7. We find the 2 closest t values.

Sweetening colas (continued)

Is there evidence that storage results in sweetness loss for the new cola

recipe at the 0.05 level of significance ( = 5%)?

H0: = 0 versus Ha: > 0 (one-sided test)

The critical value t = 1.833.

t > t thus the result is significant.

2.398 < t = 2.70 < 2.821 thus 0.02 > p > 0.01.

p < thus the result is significant.

The t-test has a significant p-value. We reject H0.

There is a significant loss of sweetness, on average, following storage.

Taster Sweetness loss 1 2.0 2 0.4 3 0.7 4 2.0 5 -0.4 6 2.2 7 -1.3 8 1.2 9 1.110 2.3___________________________Average 1.02Standard deviation 1.196Degrees of freedom n − 1 = 9

0 1.02 02.70

1.196 10

xt

s n

The one-sample t-test

As in the previous chapter, a test of hypotheses requires a few steps:

1. Stating the null and alternative hypotheses (H0 versus Ha)

2. Deciding on a one-sided or two-sided test

3. Choosing a significance level

4. Calculating t and its degrees of freedom

5. Finding the area under the curve with t-table

6. Stating the P-value and interpreting the result

The one-sample t-confidence intervalThe level C confidence interval is an interval with probability C of containing the true population parameter.

We have a data set from a population with both and unknown. We use to estimate and s to estimate using a t distribution (df n−1).

C

t*−t*

m m

m t * s n

Practical use of t : t*

C is the area between −t* and t*.

We find t* in the line of Table D for df = n−1 and confidence level C.

The margin of error m is:

x

Red wine, in moderation

Drinking red wine in moderation may protect against heart attacks. The

polyphenols it contains act on blood cholesterol, likely helping to prevent heart

attacks.

To see if moderate red wine consumption increases the average blood level of

polyphenols, a group of nine randomly selected healthy men were assigned to

drink half a bottle of red wine daily for two weeks. Their blood polyphenol levels

were assessed before and after the study, and the percent change is presented

here:

Firstly: Are the data approximately normal?

0.7 3.5 4 4.9 5.5 7 7.4 8.1 8.4

Histogram

0

1

2

3

4

2.5 5 7.5 9 More

Percentage change in polyphenol blood levels

Fre

quen

cy

There is a low

value, but overall

the data can be

considered

reasonably normal.

What is the 95% confidence interval for the average percent change?

Sample average = 5.5; s = 2.517; df = n − 1 = 8

(…)

The sampling distribution is a t distribution with n − 1 degrees of freedom.

For df = 8 and C = 95%, t* = 2.306.

The margin of error m is: m = t*s/√n = 2.306*2.517/√9 ≈ 1.93.

Therefore, the confidence interval is (5.5-1.93, 5.5+1.93).

With 95% confidence, the population average percent increase in

polyphenol blood levels of healthy men drinking half a bottle of red wine

daily is between 3.6% and 7.4%.

Type I and II errors

When we draw a conclusion from a significance test, we hope our conclusion will be correct. But sometimes it will be wrong. There are two types of mistakes we can make.

If we reject H0 when H0 is true, we have committed a Type I error.

If we fail to reject H0 when H0 is false, we have committed a Type II error.

If we reject H0 when H0 is true, we have committed a Type I error.

If we fail to reject H0 when H0 is false, we have committed a Type II error.

Truth about the population

H0 trueH0 false(Ha true)

Conclusion based on sample

Reject H0 Type I errorCorrect

conclusion

Fail to reject H0

Correct conclusion

Type II error

Type I and II errors

A Type I error is made when we reject the null hypothesis and the

null hypothesis is actually true (incorrectly reject a true H0).

The probability of making a Type I error is the significance level .

A Type II error is made when we fail to reject the null hypothesis

and the null hypothesis is false (incorrectly keep a false H0).

The probability of making a Type II error is labeled .

The power of a test is 1 − .

The Common Practice of Testing Hypotheses

1. State H0 and Ha as in a test of significance.

2. Think of the problem as a decision problem, so the probabilities of Type I and Type II errors are relevant.

3. Consider only tests in which the probability of a Type I error is no greater than .

4. Among these tests, select a test that makes the probability of a Type II error as small as possible.

1. State H0 and Ha as in a test of significance.

2. Think of the problem as a decision problem, so the probabilities of Type I and Type II errors are relevant.

3. Consider only tests in which the probability of a Type I error is no greater than .

4. Among these tests, select a test that makes the probability of a Type II error as small as possible.

Steps for Tests of Significance

1. Assumptions/Conditions

Specify variable, parameter, method of data collection, shape of population.

2. State hypotheses

Null hypothesis Ho and alternative hypothesis Ha.

3. Calculate value of the test statistic

A measure of “difference” between hypothesized value and its estimate.

4. Determine the P-value

Probability, assuming Ho true that the test statistic takes the observed value

or a more extreme value.

5. State the decision and conclusion

Interpret P-value, make decision about Ho.

8.3 Tests Concerning a Population Proportion

Sampling distribution of sample proportion The sampling distribution of a sample proportion is approximately

normal (normal approximation of a binomial distribution) when the

sample size is large enough.

p̂

Conditions for inference on pAssumptions:

1. The data used for the estimate are an SRS from the population

studied.

2. The population is at least 10 times as large as the sample used for

inference.

3. The sample size n is large enough that the sampling distribution

can be approximated with a normal distribution. Otherwise, rely on

the binomial distribution.

Large-sample confidence interval for p

Use this method when the number of

successes and the number of

failures are both at least 15.

C

Z*−Z*

m m

Confidence intervals contain the population proportion p in C% of

samples. For an SRS of size n drawn from a large population, and with

sample proportion calculated from the data, an approximate level C

confidence interval for p is:

C is the area under the standard

normal curve between −z* and z*.

nppzSEzm

mmp

)ˆ1(ˆ**

error ofmargin theis ,ˆ

p̂

Medication side effects

Arthritis is a painful, chronic inflammation of the joints.

An experiment on the side effects of pain relievers

examined arthritis patients to find the proportion of

patients who suffer side effects.

What are some side effects of ibuprofen?Serious side effects (seek medical attention immediately):

Allergic reaction (difficulty breathing, swelling, or hives)Muscle cramps, numbness, or tinglingUlcers (open sores) in the mouthRapid weight gain (fluid retention)SeizuresBlack, bloody, or tarry stoolsBlood in your urine or vomitDecreased hearing or ringing in the earsJaundice (yellowing of the skin or eyes)Abdominal cramping, indigestion, or heartburn

Less serious side effects (discuss with your doctor):Dizziness or headacheNausea, gaseousness, diarrhea, or constipationDepressionFatigue or weaknessDry mouthIrregular menstrual periods

Upper tail probability P0.25 0.2 0.15 0.1 0.05 0.03 0.02 0.01

z* 0.67 0.841 1.036 1.282 1.645 1.960 2.054 2.32650% 60% 70% 80% 90% 95% 96% 98%

Confidence level C

Let’s calculate a 90% confidence interval for the population proportion of arthritis patients who suffer some “adverse symptoms.”

What is the sample proportion ?

))1( ,( ˆ npppNp

0174.00106.0*645.1

440/)052.01(052.0*645.1

)ˆ1(ˆ*

m

m

nppzm

052.0440

23ˆ p

What is the sampling distribution for the proportion of arthritis patients with

adverse symptoms for samples of 440?

For a 90% confidence level, z* = 1.645.

Using the large sample method, we

calculate a margin of error m:

With a 90% confidence level, between 3.5% and 6.9% of arthritis patients

taking this pain medication experience some adverse symptoms.

0174.0052.0or

ˆ:forCI%90

mpp

p̂

Significance test for pThe sampling distribution for is approximately normal for large sample sizes and its shape depends solely on p and n.

Thus, we can easily test the null hypothesis:

H0: p = p0 (a given value we are testing).

n

pp

ppz

)1(

ˆ

00

0

If H0 is true, the sampling distribution is known

The likelihood of our sample proportion given the null hypothesis depends on how far from p0 our

is in units of standard deviation.

This is valid when both expected counts—expected successes np0 and

expected failures n(1 − p0)—are each 10 or larger.

p0(1 p0)

n

p0

p̂

p̂

p̂

A national survey by the National Institute for Occupational Safety and Health on

restaurant employees found that 75% said that work stress had a negative impact

on their personal lives.

You investigate a restaurant chain to see if the proportion of all their employees

negatively affected by work stress differs from the national proportion p0 = 0.75.

H0: p = p0 = 0.75 vs. Ha: p ≠ 0.75 (2 sided alternative)

In your SRS of 100 employees, you find that 68 answered “Yes” when asked,

“Does work stress have a negative impact on your personal life?”

The expected counts are 100 × 0.75 = 75 and 25.

Both are greater than 10, so we can use the z-test.

The test statistic is:

62.1

100)25.0)(75.0(

75.068.0

)1(

ˆ

00

0

npp

ppz

From Table A we find the area to the left of z = -1.62 is 0.0526.

Thus P(Z ≤ -1.62) = 0.0526. Since the alternative hypothesis is two-sided, the P-

value is the area in both tails, and therefore the p-value = 2 × 0.0526 = 0.1052.

The chain restaurant data

are not significantly different

from the national survey results

( = 0.68, z = -1.62,

p-value = 0.11).

p̂

tests of hypotheses based on a single sample

Documents

specific hypothesis

pvaluethe null hypothesis

alternative hypothesis

specific number ha

hypotheses tests

specific statement

parameter value

alternative hypotheses