hypothesis testing - comp 245 statistics · dr n a heard (room 545 huxley building) hypothesis...

47
Hypothesis Testing COMP 245 STATISTICS Dr N A Heard Room 545 Huxley Building Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 1 / 47

Upload: others

Post on 23-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Hypothesis TestingCOMP 245 STATISTICS

Dr N A Heard

Room 545 Huxley Building

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 1 / 47

Page 2: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Hypotheses

Suppose we are going to obtain a random i.i.d. sampleX = (X1, . . . , Xn) of a random variable X with an unknowndistribution PX. To proceed with modelling the underlyingpopulation, we might hypothesise probability models for PX and thentest whether such hypotheses seem plausible in light of the realiseddata x = (x1, . . . , xn).Or, more specifically, we might fix upon a parametric family PX|θ withunknown parameter θ and then hypothesise values for θ. Genericallylet θ0 denote a hypothesised value for θ. Then after observing the data,we wish to test whether we can indeed reasonably assume θ = θ0. Forexample, if X ∼ N(µ, σ2) we may wish to test whether µ = 0 isplausible in light of the data x.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 2 / 47

Page 3: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Formally, we define a null hypothesis H0 as our hypothesised modelof interest, and also specify an alternative hypothesis H1 of rivalmodels against which we wish to test H0.Most often we simply test H0 : θ = θ0 against H1 : θ 6= θ0. This isknown as a two-sided test. In some situations it may be moreappropriate to consider alternatives of the form H1 : θ > θ0 orH1 : θ < θ0, known as one-sided tests.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 3 / 47

Page 4: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Rejection Region for a Test Statistic

To test the validity of H0, we first choose a test statistic T(X) of thedata for which we can find the distribution, PT, under H0.Then, we identify a rejection region R ⊂ R of low probability valuesof T under the assumption that H0 is true, so

P(T ∈ R|H0) = α

for some small probability α (typically 5%).A well chosen rejection region will have relatively high probabilityunder H1, whilst retaining low probability under H0.Finally, we calculate the observed test statistic t(x) for our observeddata x. If t ∈ R we “reject the null hypothesis at the 100α% level”.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47

Page 5: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

p-Values

For each possible significance level α ∈ (0, 1), a hypothesis test at the100α% level will result in either rejecting or not rejecting H0.As α→ 0 it becomes less and less likely that the null hypothesis will berejected, as the rejection region is becoming smaller and smaller.Similarly, as α→ 1 it becomes more and more likely that the nullhypothesis will be rejected.For a given data set and resulting test statistic, we might, therefore, beinterested in identifying the critical significance level which marks thethreshold between us rejecting and not rejecting the null hypothesis.This is known as the p-value of the data.Smaller p-values suggest stronger evidence against H0.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 5 / 47

Page 6: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Test Errors

There are two types of error in the outcome of a hypothesis test:Type I: Rejecting H0 when in fact H0 is true. By construction, thishappens with probability α. For this reason, the significance levelof a hypothesis test is also referred to as the Type I error rate.Type II: Not rejecting H0 when in fact H1 is true. The probabilitywith which this type of error will occur depends on the unknowntrue value of θ, so to calculate values we plug-in a plausiblealternative value for θ 6= θ0, θ1 say, and let β = P(T /∈ R|θ = θ1) bethe probability of a Type II error.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 6 / 47

Page 7: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Power

We define the power of a hypothesis test by

Defn.

1− β = P(T ∈ R|θ = θ1).

For a fixed significance level α, a well chosen test statistic T andrejection region R will have high power - that is, maximise theprobability of rejecting the null hypothesis when the alternative is true.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 7 / 47

Page 8: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

N(µ, σ2) - σ2 Known

Suppose X1, . . . , Xn are i.i.d. N(µ, σ2) with σ2 known and µ unknown.We may wish to test if µ = µ0 for some specific value µ0 (e.g. µ0 = 0,µ0 = 9.8).

Then we can state our null and alternative hypotheses as

H0 :µ = µ0;H1 :µ 6= µ0.

Under H0 : µ = µ0, we then know both µ and σ2. So for the samplemean X we have a known distribution for the test statistic

Z =X− µ0

σ/√

n∼ Φ.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 8 / 47

Page 9: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

So if we define our rejection region R to be the 100α% tails of thestandard normal distribution distribution,

R =(−∞,−z1− α

2

)∪(

z1− α2, ∞)

≡{

z∣∣∣|z| > z1− α

2

},

we have P(Z ∈ R) = α under H0.

We thus reject H0 at the 100α% significance level ⇐⇒ our observed

test statistic z =x− µ0

σ/√

n∈ R.

The p-value is given by 2× {1−Φ(|z|)}.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 9 / 47

Page 10: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Ex. 5% Rejection Region for N(0, 1) Statistic

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

z

φφ((z))

R

Jump to t5 5% rejection region

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 10 / 47

Page 11: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

There is a strong connection in this context between hypothesis testingand confidence intervals.Suppose we have constructed a 100(1− α)% confidence interval for µ .Then this is precisely the set of values {µ0} for which there would beinsufficient evidence to reject a null hypothesis H0 : µ = µ0 at the100α%-level.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 11 / 47

Page 12: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Example

A company makes packets of snack foods. The bags are labelled asweighing 454g; of course they won’t all be exactly 454g, and let’ssuppose the variance of bag weights is known to be 70g². Thefollowing data show the mass in grams of 50 randomly sampledpackets.

464, 450, 450, 456, 452, 433, 446, 446, 450, 447, 442, 438, 452,447, 460, 450, 453, 456, 446, 433, 448, 450, 439, 452, 459, 454,456, 454, 452, 449, 463, 449, 447, 466, 446, 447, 450, 449, 457,464, 468, 447, 433, 464, 469, 457, 454, 451, 453, 443

Are these data consistent with the claim that the mean weight ofpackets is 454g?

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 12 / 47

Page 13: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

1 We wish to test H0 : µ = 454 vs. H1 : µ 6= 454. So set µ0 = 454.2 Although we have not been told that the packet weights are

individually normally distributed, by the CLT we still have thatthe mean weight of the sample of packets is approximatelynormally distributed, and hence we still approximately have

Z =X− µ0

σ/√

n∼ Φ.

3 x = 451.22 and n = 50⇒ z =x− µ0

σ/√

n= −2.350.

4 For a 5%-level significance test, we compare the statisticz = −2.350 with the rejection regionR = (−∞,−z0.975) ∪ (z0.975, ∞) = (−∞,−1.96) ∪ (1.96, ∞). Clearlywe have z ∈ R, and so at the 5%-level we reject the null hypothesisthat the mean packet weight is 454g.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 13 / 47

Page 14: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

5 At which significance levels would we have not rejected the nullhypothesis?

I For a 1%-level significance test, the rejection region would havebeen

R = (−∞,−z0.995) ∪ (z0.995, ∞) = (−∞,−2.576) ∪ (2.576, ∞).

In which case z /∈ R, and so at the 1%-level we would not haverejected the null hypothesis.

I The p-value is

2× {1−Φ(|z|)} = 2× {1−Φ(| − 2.350|)} ≈ 2(1− 0.9906)= 0.019,

and so we would only reject the null hypothesis for α > 1.9%.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 14 / 47

Page 15: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

N(µ, σ2) - σ2 Unknown

Similarly, if σ2 in the above example were unknown, we still have that

T =X− µ0

sn−1/√

n∼ tn−1.

So for a test of H0 : µ = µ0 vs. H1 : µ 6= µ0 at the α level, the rejection

region of our observed test statistic t =x− µ0

sn−1/√

nis

R =(−∞,−tn−1,1− α

2

)∪(

tn−1,1− α2, ∞)

≡{

t∣∣∣|t| > tn−1,1− α

2

}.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 15 / 47

Page 16: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Ex. 5% Rejection Region for t5 Statistic

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

t

f(t)

R

Jump to Φ 5% rejection region

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 16 / 47

Page 17: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Example 1

Consider again the snack food weights example. There, we assumedthe variance of bag weights was known to be 70. Without this, wecould have estimated the variance by

s2n−1 =

1n− 1

n

∑i=1

(xi − x)2 = 70.502.

Then the corresponding t-statistic becomes

t =x− µ0

sn−1/√

n= −2.341,

very similar to the z-statistic of before.And since n = 50, we compare with the t49 distribution which isapproximately N(0, 1). So the hypothesis test results and p-valuewould be practically identical.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 17 / 47

Page 18: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Example 2

A particular piece of code takes a random time to run on a computer,but the average time is known to be 6 seconds. The programmer triesan alternative optimisation in compilation and wishes to knowwhether the mean run time has changed. To explore this, he runs there-optimised code 16 times, obtaining a sample mean run time of 5.8seconds and bias-corrected sample standard deviation of 1.2 seconds.Is the code any faster?

1 We wish to test H0 : µ = 6 vs. H1 : µ 6= 6. So set µ0 = 6.2 Assuming the run times are approximately normal,

T =X− µ0

sn−1/√

n∼ tn−1. That is,

X− 6sn−1/

√16∼ t15. So we reject H0 at

the 100α% level if |T| > t15,1−α/2.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 18 / 47

Page 19: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

3 x = 5.8, sn−1 = 1.2 and n = 16⇒ t =x− µ0

sn−1/√

n=−23

.

4 We have |t| = 2/3 << 2.13 = t15,.975, so we have insufficientevidence to reject H0 at the 5% level.

5 In fact, the p-value for these data is 51.51%, so there is very littleevidence to suggest the code is now any faster.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 19 / 47

Page 20: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Samples from 2 Populations

Suppose, as before, we have a random sample X = (X1, . . . , Xn1) froman unknown population distribution PX.But now, suppose we have a further random sample Y = (Y1, . . . , Yn2)from a second, different population PY.

Then we may wish to test hypotheses concerning the similarity of thetwo distributions PX and PY.

In particular, we are often interested in testing whether PX and PYhave equal means. That is, to test

H0 : µX = µY vs H1 : µX 6= µY.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 20 / 47

Page 21: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Paired Data

A special case is when the two samples X and Y are paired. That is, ifn1 = n2 = n and the data are collected as pairs (X1, Y1), . . . , (Xn, Yn) sothat, for each i, Xi and Yi are possibly dependent.

For example, we might have a random sample of n individuals and Xirepresents the heart rate of the ith person before light exercise and Yithe heart rate of the same person afterwards.

In this special case, for a test of equal means we can consider thesample of differences Z1 = X1 − Y1, . . . , Zn = Xn − Yn and testH0 : µZ = 0 using the single sample methods we have seen. In theabove example, this would test whether light exercise causes a changein heart rate.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 21 / 47

Page 22: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

N(µX, σ2X), N(µY, σ2

Y)

SupposeX = (X1, . . . , Xn1) are i.i.d. N(µX, σ2

X) with µX unknown;Y = (Y1, . . . , Yn2) are i.i.d. N(µY, σ2

Y) with µY unknown;the two samples X and Y are independent.

Then we still have that, independently,

X ∼ N(

µX,σ2

Xn1

)Y ∼ N

(µY,

σ2Y

n2

)

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 22 / 47

Page 23: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

From this it follows that the difference in sample means,

X− Y ∼ N(

µX − µY,σ2

Xn1

+σ2

Yn2

),

and hence(X− Y)− (µX − µY)√

σ2X/n1 + σ2

Y/n2

∼ Φ.

So under the null hypothesis H0 : µX = µY, we have

Z =X− Y√

σ2X/n1 + σ2

Y/n2

∼ Φ.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 23 / 47

Page 24: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

N(µX, σ2X), N(µY, σ2

Y) - σ2X, σ2

Y Known

So if σ2X and σ2

Y are known, we immediately have a test statistic

z =x− y√

σ2X/n1 + σ2

Y/n2

which we can compare against the quantiles of a standard normal.That is,

R ={

z∣∣∣|z| > z1− α

2

},

gives a rejection region for a hypothesis test of H0 : µX = µY vs.H1 : µX 6= µY at the 100α% level.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 24 / 47

Page 25: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

N(µX, σ2), N(µY, σ2) - σ2 Unknown

On the other hand, suppose σ2X and σ2

Y are unknown.

Then if we know σ2X = σ2

Y = σ2 but σ2 is unknown, we can stillproceed.

We have(X− Y)− (µX − µY)

σ√

1/n1 + 1/n2∼ Φ,

and so, under H0 : µX = µY,

X− Yσ√

1/n1 + 1/n2∼ Φ.

but with σ unknown.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 25 / 47

Page 26: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Pooled Estimate of Population Variance

We need an estimator for the variance using samples from twopopulations with different means. Just combining the samplestogether into one big sample would over-estimate the variance, sincesome of the variability in the samples would be due to the differencein µX and µY.

So we define the bias-corrected pooled sample variance

Defn.

S2n1+n2−2 =

∑n1i=1(Xi − X)2 + ∑n2

i=1(Yi − Y)2

n1 + n2 − 2,

which is an unbiased estimator for σ2.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 26 / 47

Page 27: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

We can immediately see that s2n1+n2−2 is indeed an unbiased estimate of

σ2 by noting

S2n1+n2−2 =

n1 − 1n1 + n2 − 2

S2n1−1 +

n2 − 1n1 + n2 − 2

S2n2−1;

That is, s2n1+n2−2 is a weighted average of the bias-corrected sample

variances for the individual samples x and y, which are both unbiasedestimates for σ2.

Then substituting Sn1+n2−2 in for σ we get

(X− Y)− (µX − µY)

Sn1+n2−2√

1/n1 + 1/n2∼ tn1+n2−2,

and so, under H0 : µX = µY,

T =X− Y

Sn1+n2−2√

1/n1 + 1/n2∼ tn1+n2−2.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 27 / 47

Page 28: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

So we have a rejection region for a hypothesis test of H0 : µX = µY vs.H1 : µX 6= µY at the 100α% level given by

R ={

t∣∣∣|t| > tn1+n2−2,1− α

2

},

for the statistic

t =x− y

sn1+n2−2√

1/n1 + 1/n2.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 28 / 47

Page 29: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Example

The same piece of C code was repeatedly run after compilation undertwo different C compilers, and the run times under each compilerwere recorded. The sample mean and bias-corrected sample standarddeviation for Compiler 1 were 114s and 310s respectively, and thecorresponding figures for Compiler 2 were 94s and 290s. Both sets ofdata were each based on 15 runs.Suppose that Compiler 2 is a refined version of Compiler 1, and so ifµ1, µ2 are the expected run times of the code under the twocompilations, we might fairly assume µ2 ≤ µ1.Conduct a hypothesis test of H0 : µ1 = µ2 vs H1 : µ1 > µ2 at the 5%level.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 29 / 47

Page 30: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Until now we have mostly considered two-sided tests. That is tests ofthe form H0 : θ = θ0 vs H1 : θ 6= θ0.

Here we need to consider one-sided tests, which differ by thealternative hypothesis being of the form H1 : θ < θ0 or H1 : θ > θ0.

This presents no extra methodological challenge and requires only aslight adjustment in the construction of the rejection region.

We still use the t-statistic

t =x− y

sn1+n2−2√

1/n1 + 1/n2,

where x, y are the sample mean run times under Compilers 1 and 2respectively. But now the one-sided rejection region becomes

R = {t |t > tn1+n2−2,1−α } .

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 30 / 47

Page 31: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

First calculating the bias-corrected pooled sample variance, we get

s2n1+n2−2 =

14× 310 + 14× 29028

= 300.

(Note that since the sample sizes n1 and n2 are equal, the pooledestimate of the variance is the average of the individual estimates.)

So t =x− y

sn1+n2−2√

1/n1 + 1/n2=

114− 94√300√

1/15 + 1/15

=√

10 = 3.162.

For a 1-sided test we compare t = 3.162 with t28,0.95 = 1.701 andconclude that we reject the null hypothesis at the 5% level; the secondcompilation is significantly faster.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 31 / 47

Page 32: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Count Data

The results in the previous sections relied upon the data being eithernormally distributed, or at least through the CLT having the samplemean being approximately normally distributed. Tests were thendeveloped for making inference on population means under thoseassumptions. These tests were very much model-based.

Another important but very different problem concerns model checking,which can be addressed through a more general consideration of countdata for simple (discrete and finite) distributions.

The following ideas can then be trivially extended to infinite rangediscrete and continuous r.v.s by binning observed samples into a finitecollection of predefined intervals.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 32 / 47

Page 33: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Samples from a Simple Random Variable

Let X be a simple random variable taking values in the range{x1, . . . , xk}, with probability mass function pj = P(X = xj),j = 1, . . . , k.A random sample of size n from the distribution of X can besummarised by the observed frequency counts O = (O1, . . . , Ok) at thepoints x1, . . . , xk (so ∑k

j=1 Oj = n).

Suppose it is hypothesised that the true pmf {pj} is from a particularparametric model pj = P(X = xj|θ), j = 1, . . . , k for some unknownparameter p-vector θ.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 33 / 47

Page 34: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

To test this hypothesis about the model, we first need to estimate theunknown parameters θ so that we are able to calculate the distributionof any statistic under the null hypothesis H0 : pj = P(X = xj|θ),j = 1, . . . , k. Let θ be such an estimator, obtained using the sample O.Then under H0 we have estimated probabilities for the pmfpj = P(X = xj|θ), j = 1, . . . , k and so we are able to calculate estimatedexpected frequency counts E = (E1, . . . , Ek) by Ej = npj. (Note againwe have ∑k

j=1 Ej = n.)

We then seek to compare the observed frequencies with the expectedfrequencies to test for goodness of fit.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 34 / 47

Page 35: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Chi-Square Test

To test H0 : pj = P(X = xj|θ) vs. H1 : 0 ≤ pj ≤ 1, ∑ pj = 1 we use thechi-square statistic

Defn.

X2 =k

∑i=1

(Oi − Ei)2

Ei.

If H0 were true, then the statistic X2 would approximately follow achi-square distribution with ν = k− p− 1 degrees of freedom.

k is the number of values (categories) the simple r.v. X can take.p is the number of parameters being estimated (dim(θ)).For the approximation to be valid, we should have ∀j, Ej ≥ 5. Thismay require some merging of categories.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 35 / 47

Page 36: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

χ2ν pdf for ν = 1, 2, 3, 5, 10

0 2 4 6 8 10

0.0

0.1

0.2

0.3

0.4

0.5

0.6

x

χχ2 νν((x))

χχ21

χχ22

χχ23

χχ25

χχ210

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 36 / 47

Page 37: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Rejection Region

Clearly larger values of X2 correspond to larger deviations from thenull hypothesis model. That is, if X2 = 0 the observed counts exactlymatch those expected under H0.

For this reason, we always perform a one-sided goodness of fit testusing the χ2 statistic, looking only at the upper tail of the distribution.

Hence the rejection region for a goodness of fit hypothesis test at the atthe 100α% level is given by

R ={

x2∣∣∣x2 > χ2

k−p−1,1−α

}.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 37 / 47

Page 38: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Example

Each year, around 1.3 million people in the USA suffer adverse drugeffects (ADEs). A study in the Journal of the American MedicalAssociation (July 5, 1995) gave the causes of 95 ADEs below.

Cause Number of ADEsLack of knowledge of drug 29Rule violation 17Faulty dose checking 13Slips 9Other 27

Test whether the true percentages of ADEs differ across the 5 causes.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 38 / 47

Page 39: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Under the null hypothesis that the 5 causes are equally likely, wewould have expected counts of 95

5 = 19 for each cause.So our χ2 statistic becomes

x2 =(29− 19)2

19+

(17− 19)2

19+

(13− 19)2

19+

(9− 19)2

19+

(27− 19)2

19

=10019

+419

+3619

+10019

+6419

=30419

= 16.

We have not estimated any parameters from the data, so we comparex2 with the quantiles of the χ2

5−1 = χ24 distribution.

Well 16 > 9.49 = χ24,0.95, so we reject the null hypothesis at the 5%

level; we have reason to suppose that there is a difference in the truepercentages across the different causes.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 39 / 47

Page 40: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Example - Fitting a Poisson Distribution to Data

Recall the example from the Discrete Random Variables chapter, wherethe number of particles emitted by a radioactive substance whichreached a Geiger counter was measured for 2608 time intervals, eachof length 7.5 seconds.

We fitted a Poisson(λ) distribution to the data by plugging in thesample mean number of counts (3.870) for the rate parameter λ.(Which we now know to be the MLE!)

x 0 1 2 3 4 5 6 7 8 9 ≥10O(nx) 57 203 383 525 532 408 273 139 45 27 16E(nx) 54.4 210.5 407.4 525.5 508.4 393.5 253.8 140.3 67.9 29.2 17.1

(O=Observed, E=Expected).

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 40 / 47

Page 41: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Whilst the fitted Poisson(3.87) expected frequencies looked quiteconvincing to the eye, at that time we had no formal method ofquantitatively assessing the fit. However, we now know how toproceed.

x 0 1 2 3 4 5 6 7 8 9 ≥10O 57 203 383 525 532 408 273 139 45 27 16E 54.4 210.5 407.4 525.5 508.4 393.5 253.8 140.3 67.9 29.2 17.1O− E 2.6 -7.5 -24.4 -0.5 23.6 14.5 19.2 -1.3 22.9 2.2 1.1(O−E)2

E 0.124 0.267 1.461 0.000 1.096 0.534 1.452 0.012 7.723 0.166 0.071

The statistic x2 = ∑ (O−E)2

E = 12.906 should be compared with aχ2

11−1−1 = χ29 distribution.

Well χ29,0.95 = 16.91, so at the 5% level we do not reject the null

hypothesis of a Poisson(3.87) model for the data.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 41 / 47

Page 42: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Contingency Tables

Suppose we have two simple random variables X and Y which arejointly distributed with unknown probability mass function pXY.

We are often interested in trying to ascertain whether X and Y areindependent. That is, determine whether pXY(x, y) = pX(x)pY(y).

Let the ranges of the r.v.s X and Y be {x1, . . . , xk} and {y1, . . . , y`}respectively.

Then an i.i.d. sample of size n from the joint distribution of (X, Y) canbe represented by a list of counts nij (1 ≤ i ≤ k; 1 ≤ j ≤ `) of thenumber of times we observe the pair (xi, yj).

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 42 / 47

Page 43: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Tabulating these data in the following way gives what is known as ak× ` contingency table.

y1 y2 . . . y`x1 n11 n12 n1` n1·x2 n21 n22 n2` n2·...

xk nk1 nk2 nk` nk·n·1 n·2 . . . n·` n

Note the row sums (n1·, n2·, . . . , nk·) represent the frequencies ofx1, x2, . . . , xk in the sample (that is, ignoring the value of Y). Similarlyfor the column sums (n·1, n·2, . . . , n·`) and y1, . . . , y`.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 43 / 47

Page 44: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Under the null hypothesis

H0 : X and Y are independent,

the expected values of the entries of the contingency table, conditionalon the row and column sums, can be estimated by

nij =ni· × n·j

n, 1 ≤ i ≤ k, 1 ≤ j ≤ `.

To see this, consider the marginal distribution of X; we could estimate

pX(xi) by pi· =ni·n

. Similarly for pY(yj) we get p·j =n·jn

.Then under the null hypothesis of independencepXY(xi, yj) = pX(xi)pY(yj), and so we can estimate pXY(xi, yj) by

pij = pi· × p·j =ni· × n·j

n2 .

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 44 / 47

Page 45: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Now that we have a set of expected frequencies to compare againstour k× ` observed frequencies, a χ2 test can be performed.

We are using both the row and column sums to estimate ourprobabilities, and there are k and ` of these respectively. So wecompare our calculated x2 statistic against a χ2 distribution withk`− {(k− 1) + (`− 1)} − 1 = (k− 1)(`− 1) degrees of freedom.

Hence the rejection region for a hypothesis test of independence in ak× ` contingency table at the at the 100α% level is given by

R ={

x2∣∣∣x2 > χ2

(k−1)(`−1),1−α

}.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 45 / 47

Page 46: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Example

An article in International Journal of Sports Psychology (July-Sept 1990)evaluated the relationship between physical fitness and stress. 549people were classified as good, average, or poor fitness, and were alsotested for signs of stress (yes or no). The data are shown in the tablebelow.

Poor Fitness Average Fitness Good FitnessStress 206 184 85 475No stress 36 28 10 74

242 212 95 549

Is there any relationship between stress and fitness?

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 46 / 47

Page 47: Hypothesis Testing - COMP 245 STATISTICS · Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 4 / 47. p-Values For each possible significance level a 2(0,1), a hypothesis

Under independence we would estimate the expected values to bePoor Fitness Average Fitness Good Fitness

Stress 209.4 183.4 82.2 475No stress 32.6 28.6 12.8 74

242 212 95 549

Hence the χ2 statistic is calculated to be

X2 = ∑i

(Oi − Ei)2

Ei=

(206− 209.4)2

209.4+ . . . +

(10− 12.88)2

12.8= 1.1323.

This should be compared with a χ2 distribution with(2− 1)× (3− 1) = 2 degrees of freedom.χ2

2,0.95 = 5.99, so we have no significant evidence to suggest there isany relationship between fitness and stress.

Dr N A Heard (Room 545 Huxley Building) Hypothesis Testing 47 / 47