hypothesis testing i. - universiteit leidenpub.math.leidenuniv.nl/~szabobt/stan/stan10_1.pdf ·...

Introduction Wald test p-values t-test Quiz

Hypothesis testing I.

Botond Szabo

Leiden University

Leiden, 26 March 2018


Outline

1 Introduction

2 Wald test

3 p-values

4 t-test

5 Quiz


Example

Suppose we want to know if exposure to asbestos isassociated with lung disease.

We take some rats and randomly divide them into two groups.

We expose one group to asbestos and then compare thedisease rate in two groups.

We consider the following two hypotheses:1 The null hypothesis: the disease rate is the same in both

groups.2 Alternative hypothesis: The disease rate is not the same in two

groups.

If the exposed group has a much higher disease rate, we willreject the null hypothesis and conclude that the evidencefavours the alternative hypothesis.


Null and alternative hypotheses

We partition the parameter space Θ into sets Θ0 and Θ1 andwish to test

H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ1.

We call H0 the null hypothesis and H1 the alternativehypothesis.

In our previous example θ is the difference between thedisease rates in the exposed and not exposed groups,Θ0 = {0}, and Θ1 = (0,∞).


Rejection region

Let T be a random variable and let T be the range of X . Wetest the hypothesis by finding a subset of outcomes R ⊂ X ,called the rejection region. If T ∈ R, we reject the nullhypothesis, otherwise we retain it:

T ∈ R ⇒ reject H0T /∈ R ⇒ retain H0.

In our previous example T is an estimate of the differencebetween the disease rates in the exposed and not exposedgroups. We reject the null hypothesis if T is large.


Test statistic and critical value

Usually, the rejection region R is of the form

R = {x : T (x) > c},

where T is a test statistic and c is critical value.

The problem in hypothesis testing is to find an appropriatetest statistic T and an appropriate critical value c .


Warning

There is a tendency to use hypothesis testing even when theyare not appropriate.

Point estimates and confidence intervals are at times a betteroption.

Use testing only with well-defined hypotheses and in generaltake care of formulating them: things can go wrong whentesting ill-defined or overly complex hypotheses.


Type I and type II errors

By rejecting H0 when H0 is true we commit type I error.

By retaining H0 when H1 is true we commit type II error.

Retain Null Reject Null

H0 true X type I errorH1 true type II error X


Power function

Definition

The power function of a test T with rejection region R is definedby

β(θ) = Pθ(T ∈ R).

The size of a test is defined to be

α = supθ∈Θ0

β(θ).

A test is said to be of level α, if its size is less or equal to α.


Comments

Let Θ0 = {θ0} and Θ1 = {θ1}.In this case β(θ0) is the probability of type I error.

β(θ1) gives 1 minus the probability of type II error.

In the ideal case we would like the probabilities of both typesof errors be equal to zero. Except trivial cases, this is notpossible. So we concentrate on tests of size or level α, whereα is taken to be small, e.g. α = 0.05.


Hypotheses and tests

A hypothesis of the form θ = θ0 is called a simple hypothesis.

A hypothesis of the form θ > θ0 or θ < θ0 (or both) is calleda composite hypothesis.

A test of the form

H0 : θ = θ0 versus H1 : θ 6= θ0

is called a two-sided test.

A test of the form

H0 : θ ≤ θ0 versus H1 : θ > θ0

orH0 : θ ≥ θ0 versus H1 : θ < θ0

is called a one-sided test.


Example

Example

Let X1, . . . ,Xn ∼ N(µ, σ), with σ known. We want to testH0 : µ ≤ 0 versus µ > 0. Hence Θ0 = (−∞, 0] and Θ1 = (0,∞).Let T = X n and conisder the test

reject H0 if T > c.

The rejection region is

R = {(x1, . . . , xn) : T (x1, . . . , xn) > c} .


Example (continued)

Example

The power function is

β(µ) = Pµ(X n > c)

= Pµ(√

n(X n − µ)σ

>

√n(c − µ)σ

)= P

(Z >

√n(c − µ)σ

)= 1− Φ

(√n(c − µ)σ

).

The power function is increasing in µ.


Example (continued)

Example

Hence

size = supµ≤0

β(µ) = β(0) = 1− Φ(√

nc

σ

).

For size α test we set this equal to α and solve for c to get

c =σΦ−1(1− α)√

n.

We reject the null hypothesis when X n > c . Equivalently, when

√nX nσ

> zα = Φ−1(1− α).


Example 2

Example

Let X be an observation from a Uniform(0, θ) distribution.Suppose one wants to test H0 : θ = 2 versus H1 : θ 6= 2. Supposeone rejects H0 if X ≥ 1.9. Compute the probability of type 1 error.


Most powerful test

It would be desirable to find the test with highest power underH1 among all size α tests.

Such a test, if it exists, is called the most powerful test.

Finding such tests is not easy, and often they don’t even exist.

We shall instead consider several widely used tests.


Wald test

Definition

Let θ be a scalar parameter and θ̂ an estimate of θ with ŝe itsestimated standard error. Consider the test

H0 : θ = θ0 versus H1 : θ 6= θ0.

Assume that θ̂ is asymptotically normal:

θ̂ − θ0ŝe

N(0, 1).

The size α Wald test is: reject H0 when |W | > zα/2, where

W =θ̂ − θ0

ŝe.


Size of the Wald test

Theorem

Asymptotically, the Wald test has size α, i.e.

Pθ0(|W | > zα/2)→ α

as n→∞.

Proof.

We have

Pθ0(|W | > zα/2) = Pθ0

(|θ̂ − θ0|

ŝe> zα/2

)→ P(|Z | > zα/2)= α.


Remark

An alternative version of the Wald test statistic is

W =θ̂ − θ0

se0,

where se0 is the standard error of θ̂ computed at θ0.

Both versions of the test are valid.


Power of the Wald test

Theorem

Suppose the true value of θ is θ∗ 6= θ0. The power β(θ∗) (theprobability of correctly rejecting the null hypothesis) isapproximately given by

1− Φ

(θ̂ − θ∗

ŝe+ zα/2

)+ Φ

(θ̂ − θ∗

ŝe− zα/2

).


Example

Example

Let X1, . . . ,Xm and Y1, . . . ,Yn be two independent samples fromdistributions with means µ1 and µ2, respectively. Let us test thenull hypothesis that µ1 = µ2. Write this as

H0 : δ = 0 versus H1 : δ 6= 0,

where δ = µ1 − µ2. The plug-in estimate of δ is δ̂ = Xm −Y n withestimated standard error ŝe =

√s21m +

s22n , where s

21 , s

22 are sample

variances. The size α Wald test rejects H0 when |W | > zα/2, where

W =δ̂ − 0

ŝe=

Xm − Y n√s21m +

s22n

.


Example 2

Example

Let us assume that we flip a coin 100 times and 25 out of the 100we got head. Do we have strong enough evidence to say that thecoin is not fair?


Relation to confidence intervals

Theorem

The size α Wald test rejects H0 : θ = θ0 versus H1 : θ 6= θ0 if andonly if θ0 /∈ C , where

C = (θ̂ − ŝezα/2, θ̂ + ŝezα/2).

In words, testing the hypothesis with Wald test isequivalent tochecking whether the null value θ0 is in the confidence interval.


Warning

When we reject H0, we say the result is statistically significant.

The result may be statistically significant without the effectsize being large.

In such a case we have a result that is statistically significant,but perhaps not scientifically or practically significant.

Any confidence interval that excludes θ0 corresponds torejecting θ0. But θ0 can be close to an endpoint of theconfidence interval (not scientifically significant), or far away(scientifically significant).


Heuristics

Consider a hypothesis test with test statistic T and therejection region Rα at level α.

Fix α. Upon observing the data X we evaluate the statisticT (X ) and ask whether the test rejects the null hypothesis atlevel α.

If yes, we gradually decrease α, while asking the samequestion.

Eventually we arrive at α at which the test does not reject thenull hypothesis.

This value of α is called the p-value. It measures the strengthof evidence against H0: when the p-value is small, the dataoffers little support in favour of H0.


Evidence scale

The p-value offers a ‘continuous’ scale of evidence against thenull hypothesis and can be more informative than a simple‘accept/reject’ of the hypothesis testing.

The following evidence scale is typically used:

p-value evidence

< 0.01 very strong evidence against H00.01− 0.05 strong evidence against H00.05− 0.10 weak evidence against H0> 0.1 no or little evidence against H0

Warning: a large p-value is not strong evidence in favour ofH0. The large p-value can occur for two reasons:

(i) H0 is true, or(ii) H0 is false, but the test has low power.


p-value

Definition

Suppose for every α ∈ (0, 1) we have a size α test with rejectionregion Rα. Then

p-value = inf{α : T (X ) ∈ Rα}.

That is, the p-value is the smallest level at which we can reject H0.


Computation

Theorem

Suppose that the size α test is of the form

reject H0 if and only if T (X ) ≥ cα.

Then, with x the observed value of X ,

p-value = supθ∈Θ0

Pθ(T (X ) ≥ T (x)).

Thus if Θ0 = {θ0},

p-value = Pθ0(T (X ) ≥ T (x)).

In other words, p-value is the probability under H0 of observing avalue of the test statistic the same or more extreme than what weactually observed.


Wald statistic

Theorem

Let w = (θ̂ − θ0)/ŝe denote the observed value of the Waldstatistic. The p-value is given by

p-value = Pθ0(|W | > |w |) ≈ Pθ0(|Z | > |w |) = 2Φ(−|w |).


Distribution of the p-value

Theorem

If the test statistic has a continuous distribution, then underH0 : θ = θ0 the p-value has a Uniform(0, 1) distribution.Therefore, if we reject H0 when the p-value is less than α, theprobability of type I error is α.


t-test

To test H0 : µ = µ0 versus µ 6= µ0, where µ = E[Xi ], we canuse the Wald test.

When X1, . . . ,Xn ∼ N(µ, σ2) (with θ = (µ, σ2) unknown) andn is small, it is more common to use the t-test.

A random variable has a t-distribution with k degrees offreedom, if its PDF is given by

f (t) =Γ(k+1

2

)√kπΓ

(k2

) (1 + t

2

k

)(k+1)/2 .When k →∞, this tends to the N(0, 1) distribution.


t-distribution


t-test (continued)

Let

T =

√n(X n − µ0)

Sn,

where S2n is the sample variance.

Under H0, T ∼ tn−1.Let tn−1,α/2 denote the upper α/2-quantile of thet-distribution.

If we reject the null hypothesis when |T | > tn−1,α/2, we get asize α test.

However, already for moderately large n the test is essentiallyidentical to the Wald test.


Example

Example

The bload pressure of 6 patients were measured before taking amedicine (124, 134, 117, 150, 160, 150) and after taking themedicine (135, 136, 120, 145, 155, 126), respectively. Do we havesufficient statistical evidence that the medicine was useful inlowering the bload pressure?


Question 1

What is our aim in hypothesis testing?

Answers:

1 We check whether we can accept our theory (null hypothesis).

2 We test whether the data provides sufficient evidence to rejectour theory (null hypothesis).

3 We check whether the null hypothesis or the alternativehypothesis is more likely.

4 We check whether the data is normally distributed?


Question 2

What is not true for the rejection region R?

Answers:

1 It is a random variable.

2 We reject the null hypothesis if the test statistics T is insideof R.

3 Typically is of the form R = {x : T (x) > c}.4 We make decision on the null hypothesis based on the

rejection region and the test statistics, which form a “pair”.


Question 3

Which of the following statements is not true for the error types inhypothesis?

Answers:

1 The type I error is when we reject the null hypothesis altoughit is true.

2 Type II error is when we retain the null hypothesis although itwas not true.

3 We make a type II error when we do not reject the nullhypothesis when the alternate is true.

4 We make a type II error if we accept the null hypothesis and itwas true.


Question 4

What is not true for the Wald test?

Answers:

1 The Wald test statistic is W = (θ̂ − θ0)/ŝe and the criticalregion |W | > zα/2.

2 Pθ0(|W | > zα/2)→ α.3 The data has to be normally distributed to apply it.

4 An alternative version of the Wald test is W = (θ̂ − θ0)/se0.


Question 5

What is a p value?

Answers:

1 The probability of the null hypothesis.

2 It measures the strength of evidence against H0.

3 It is equivalent to the first type of error.

4 It provides a strong evidence in favour of H0.


Question 5

What is a p-value?

Answers:

1 The probability of the null hypothesis.

2 It measures the strength of evidence against H0.

3 It is equivalent to the first type of error.

4 It provides a strong evidence in favour of H0.


Question 6

When does one use a t-test instead of a Wald test?

Answers:

1 For modorate sample size.

2 For small sample size

3 If the data is not normally distributed.

4 If the variance of the normal distribution is known.


Question 6

When does one use a t-test instead of a Wald test?

Answers:

1 For modorate sample size.

2 For small sample size.

3 If the data is not normally distributed.

4 If the variance of the normal distribution is known.

IntroductionWald testp-valuest-testQuiz

hypothesis testing i. - universiteit leidenpub.math.leidenuniv.nl/~szabobt/stan/stan10_1.pdf ·...

Documents