hypothesis testing i. - universiteit leidenpub.math.leidenuniv.nl/~szabobt/stan/stan10_1.pdf ·...

46
Introduction Wald test p-values t-test Quiz Hypothesis testing I. Botond Szabo Leiden University Leiden, 26 March 2018

Upload: others

Post on 18-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Introduction Wald test p-values t-test Quiz

    Hypothesis testing I.

    Botond Szabo

    Leiden University

    Leiden, 26 March 2018

  • Introduction Wald test p-values t-test Quiz

    Outline

    1 Introduction

    2 Wald test

    3 p-values

    4 t-test

    5 Quiz

  • Introduction Wald test p-values t-test Quiz

    Example

    Suppose we want to know if exposure to asbestos isassociated with lung disease.

    We take some rats and randomly divide them into two groups.

    We expose one group to asbestos and then compare thedisease rate in two groups.

    We consider the following two hypotheses:1 The null hypothesis: the disease rate is the same in both

    groups.2 Alternative hypothesis: The disease rate is not the same in two

    groups.

    If the exposed group has a much higher disease rate, we willreject the null hypothesis and conclude that the evidencefavours the alternative hypothesis.

  • Introduction Wald test p-values t-test Quiz

    Null and alternative hypotheses

    We partition the parameter space Θ into sets Θ0 and Θ1 andwish to test

    H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ1.

    We call H0 the null hypothesis and H1 the alternativehypothesis.

    In our previous example θ is the difference between thedisease rates in the exposed and not exposed groups,Θ0 = {0}, and Θ1 = (0,∞).

  • Introduction Wald test p-values t-test Quiz

    Rejection region

    Let T be a random variable and let T be the range of X . Wetest the hypothesis by finding a subset of outcomes R ⊂ X ,called the rejection region. If T ∈ R, we reject the nullhypothesis, otherwise we retain it:

    T ∈ R ⇒ reject H0T /∈ R ⇒ retain H0.

    In our previous example T is an estimate of the differencebetween the disease rates in the exposed and not exposedgroups. We reject the null hypothesis if T is large.

  • Introduction Wald test p-values t-test Quiz

    Test statistic and critical value

    Usually, the rejection region R is of the form

    R = {x : T (x) > c},

    where T is a test statistic and c is critical value.

    The problem in hypothesis testing is to find an appropriatetest statistic T and an appropriate critical value c .

  • Introduction Wald test p-values t-test Quiz

    Warning

    There is a tendency to use hypothesis testing even when theyare not appropriate.

    Point estimates and confidence intervals are at times a betteroption.

    Use testing only with well-defined hypotheses and in generaltake care of formulating them: things can go wrong whentesting ill-defined or overly complex hypotheses.

  • Introduction Wald test p-values t-test Quiz

    Type I and type II errors

    By rejecting H0 when H0 is true we commit type I error.

    By retaining H0 when H1 is true we commit type II error.

    Retain Null Reject Null

    H0 true X type I errorH1 true type II error X

  • Introduction Wald test p-values t-test Quiz

    Power function

    Definition

    The power function of a test T with rejection region R is definedby

    β(θ) = Pθ(T ∈ R).

    The size of a test is defined to be

    α = supθ∈Θ0

    β(θ).

    A test is said to be of level α, if its size is less or equal to α.

  • Introduction Wald test p-values t-test Quiz

    Comments

    Let Θ0 = {θ0} and Θ1 = {θ1}.In this case β(θ0) is the probability of type I error.

    β(θ1) gives 1 minus the probability of type II error.

    In the ideal case we would like the probabilities of both typesof errors be equal to zero. Except trivial cases, this is notpossible. So we concentrate on tests of size or level α, whereα is taken to be small, e.g. α = 0.05.

  • Introduction Wald test p-values t-test Quiz

    Hypotheses and tests

    A hypothesis of the form θ = θ0 is called a simple hypothesis.

    A hypothesis of the form θ > θ0 or θ < θ0 (or both) is calleda composite hypothesis.

    A test of the form

    H0 : θ = θ0 versus H1 : θ 6= θ0

    is called a two-sided test.

    A test of the form

    H0 : θ ≤ θ0 versus H1 : θ > θ0

    orH0 : θ ≥ θ0 versus H1 : θ < θ0

    is called a one-sided test.

  • Introduction Wald test p-values t-test Quiz

    Example

    Example

    Let X1, . . . ,Xn ∼ N(µ, σ), with σ known. We want to testH0 : µ ≤ 0 versus µ > 0. Hence Θ0 = (−∞, 0] and Θ1 = (0,∞).Let T = X n and conisder the test

    reject H0 if T > c.

    The rejection region is

    R = {(x1, . . . , xn) : T (x1, . . . , xn) > c} .

  • Introduction Wald test p-values t-test Quiz

    Example (continued)

    Example

    The power function is

    β(µ) = Pµ(X n > c)

    = Pµ(√

    n(X n − µ)σ

    >

    √n(c − µ)σ

    )= P

    (Z >

    √n(c − µ)σ

    )= 1− Φ

    (√n(c − µ)σ

    ).

    The power function is increasing in µ.

  • Introduction Wald test p-values t-test Quiz

    Example (continued)

    Example

    Hence

    size = supµ≤0

    β(µ) = β(0) = 1− Φ(√

    nc

    σ

    ).

    For size α test we set this equal to α and solve for c to get

    c =σΦ−1(1− α)√

    n.

    We reject the null hypothesis when X n > c . Equivalently, when

    √nX nσ

    > zα = Φ−1(1− α).

  • Introduction Wald test p-values t-test Quiz

    Example 2

    Example

    Let X be an observation from a Uniform(0, θ) distribution.Suppose one wants to test H0 : θ = 2 versus H1 : θ 6= 2. Supposeone rejects H0 if X ≥ 1.9. Compute the probability of type 1 error.

  • Introduction Wald test p-values t-test Quiz

    Most powerful test

    It would be desirable to find the test with highest power underH1 among all size α tests.

    Such a test, if it exists, is called the most powerful test.

    Finding such tests is not easy, and often they don’t even exist.

    We shall instead consider several widely used tests.

  • Introduction Wald test p-values t-test Quiz

    Wald test

    Definition

    Let θ be a scalar parameter and θ̂ an estimate of θ with ŝe itsestimated standard error. Consider the test

    H0 : θ = θ0 versus H1 : θ 6= θ0.

    Assume that θ̂ is asymptotically normal:

    θ̂ − θ0ŝe

    N(0, 1).

    The size α Wald test is: reject H0 when |W | > zα/2, where

    W =θ̂ − θ0

    ŝe.

  • Introduction Wald test p-values t-test Quiz

    Size of the Wald test

    Theorem

    Asymptotically, the Wald test has size α, i.e.

    Pθ0(|W | > zα/2)→ α

    as n→∞.

    Proof.

    We have

    Pθ0(|W | > zα/2) = Pθ0

    (|θ̂ − θ0|

    ŝe> zα/2

    )→ P(|Z | > zα/2)= α.

  • Introduction Wald test p-values t-test Quiz

    Remark

    An alternative version of the Wald test statistic is

    W =θ̂ − θ0

    se0,

    where se0 is the standard error of θ̂ computed at θ0.

    Both versions of the test are valid.

  • Introduction Wald test p-values t-test Quiz

    Power of the Wald test

    Theorem

    Suppose the true value of θ is θ∗ 6= θ0. The power β(θ∗) (theprobability of correctly rejecting the null hypothesis) isapproximately given by

    1− Φ

    (θ̂ − θ∗

    ŝe+ zα/2

    )+ Φ

    (θ̂ − θ∗

    ŝe− zα/2

    ).

  • Introduction Wald test p-values t-test Quiz

    Example

    Example

    Let X1, . . . ,Xm and Y1, . . . ,Yn be two independent samples fromdistributions with means µ1 and µ2, respectively. Let us test thenull hypothesis that µ1 = µ2. Write this as

    H0 : δ = 0 versus H1 : δ 6= 0,

    where δ = µ1 − µ2. The plug-in estimate of δ is δ̂ = Xm −Y n withestimated standard error ŝe =

    √s21m +

    s22n , where s

    21 , s

    22 are sample

    variances. The size α Wald test rejects H0 when |W | > zα/2, where

    W =δ̂ − 0

    ŝe=

    Xm − Y n√s21m +

    s22n

    .

  • Introduction Wald test p-values t-test Quiz

    Example 2

    Example

    Let us assume that we flip a coin 100 times and 25 out of the 100we got head. Do we have strong enough evidence to say that thecoin is not fair?

  • Introduction Wald test p-values t-test Quiz

    Relation to confidence intervals

    Theorem

    The size α Wald test rejects H0 : θ = θ0 versus H1 : θ 6= θ0 if andonly if θ0 /∈ C , where

    C = (θ̂ − ŝezα/2, θ̂ + ŝezα/2).

    In words, testing the hypothesis with Wald test isequivalent tochecking whether the null value θ0 is in the confidence interval.

  • Introduction Wald test p-values t-test Quiz

    Warning

    When we reject H0, we say the result is statistically significant.

    The result may be statistically significant without the effectsize being large.

    In such a case we have a result that is statistically significant,but perhaps not scientifically or practically significant.

    Any confidence interval that excludes θ0 corresponds torejecting θ0. But θ0 can be close to an endpoint of theconfidence interval (not scientifically significant), or far away(scientifically significant).

  • Introduction Wald test p-values t-test Quiz

    Heuristics

    Consider a hypothesis test with test statistic T and therejection region Rα at level α.

    Fix α. Upon observing the data X we evaluate the statisticT (X ) and ask whether the test rejects the null hypothesis atlevel α.

    If yes, we gradually decrease α, while asking the samequestion.

    Eventually we arrive at α at which the test does not reject thenull hypothesis.

    This value of α is called the p-value. It measures the strengthof evidence against H0: when the p-value is small, the dataoffers little support in favour of H0.

  • Introduction Wald test p-values t-test Quiz

    Evidence scale

    The p-value offers a ‘continuous’ scale of evidence against thenull hypothesis and can be more informative than a simple‘accept/reject’ of the hypothesis testing.

    The following evidence scale is typically used:

    p-value evidence

    < 0.01 very strong evidence against H00.01− 0.05 strong evidence against H00.05− 0.10 weak evidence against H0> 0.1 no or little evidence against H0

    Warning: a large p-value is not strong evidence in favour ofH0. The large p-value can occur for two reasons:

    (i) H0 is true, or(ii) H0 is false, but the test has low power.

  • Introduction Wald test p-values t-test Quiz

    p-value

    Definition

    Suppose for every α ∈ (0, 1) we have a size α test with rejectionregion Rα. Then

    p-value = inf{α : T (X ) ∈ Rα}.

    That is, the p-value is the smallest level at which we can reject H0.

  • Introduction Wald test p-values t-test Quiz

    Computation

    Theorem

    Suppose that the size α test is of the form

    reject H0 if and only if T (X ) ≥ cα.

    Then, with x the observed value of X ,

    p-value = supθ∈Θ0

    Pθ(T (X ) ≥ T (x)).

    Thus if Θ0 = {θ0},

    p-value = Pθ0(T (X ) ≥ T (x)).

    In other words, p-value is the probability under H0 of observing avalue of the test statistic the same or more extreme than what weactually observed.

  • Introduction Wald test p-values t-test Quiz

    Wald statistic

    Theorem

    Let w = (θ̂ − θ0)/ŝe denote the observed value of the Waldstatistic. The p-value is given by

    p-value = Pθ0(|W | > |w |) ≈ Pθ0(|Z | > |w |) = 2Φ(−|w |).

  • Introduction Wald test p-values t-test Quiz

    Distribution of the p-value

    Theorem

    If the test statistic has a continuous distribution, then underH0 : θ = θ0 the p-value has a Uniform(0, 1) distribution.Therefore, if we reject H0 when the p-value is less than α, theprobability of type I error is α.

  • Introduction Wald test p-values t-test Quiz

    t-test

    To test H0 : µ = µ0 versus µ 6= µ0, where µ = E[Xi ], we canuse the Wald test.

    When X1, . . . ,Xn ∼ N(µ, σ2) (with θ = (µ, σ2) unknown) andn is small, it is more common to use the t-test.

    A random variable has a t-distribution with k degrees offreedom, if its PDF is given by

    f (t) =Γ(k+1

    2

    )√kπΓ

    (k2

    ) (1 + t

    2

    k

    )(k+1)/2 .When k →∞, this tends to the N(0, 1) distribution.

  • Introduction Wald test p-values t-test Quiz

    t-distribution

  • Introduction Wald test p-values t-test Quiz

    t-test (continued)

    Let

    T =

    √n(X n − µ0)

    Sn,

    where S2n is the sample variance.

    Under H0, T ∼ tn−1.Let tn−1,α/2 denote the upper α/2-quantile of thet-distribution.

    If we reject the null hypothesis when |T | > tn−1,α/2, we get asize α test.

    However, already for moderately large n the test is essentiallyidentical to the Wald test.

  • Introduction Wald test p-values t-test Quiz

    Example

    Example

    The bload pressure of 6 patients were measured before taking amedicine (124, 134, 117, 150, 160, 150) and after taking themedicine (135, 136, 120, 145, 155, 126), respectively. Do we havesufficient statistical evidence that the medicine was useful inlowering the bload pressure?

  • Introduction Wald test p-values t-test Quiz

    Question 1

    What is our aim in hypothesis testing?

    Answers:

    1 We check whether we can accept our theory (null hypothesis).

    2 We test whether the data provides sufficient evidence to rejectour theory (null hypothesis).

    3 We check whether the null hypothesis or the alternativehypothesis is more likely.

    4 We check whether the data is normally distributed?

  • Introduction Wald test p-values t-test Quiz

    Question 1

    What is our aim in hypothesis testing?

    Answers:

    1 We check whether we can accept our theory (null hypothesis).

    2 We test whether the data provides sufficient evidence to rejectour theory (null hypothesis).

    3 We check whether the null hypothesis or the alternativehypothesis is more likely.

    4 We check whether the data is normally distributed?

  • Introduction Wald test p-values t-test Quiz

    Question 2

    What is not true for the rejection region R?

    Answers:

    1 It is a random variable.

    2 We reject the null hypothesis if the test statistics T is insideof R.

    3 Typically is of the form R = {x : T (x) > c}.4 We make decision on the null hypothesis based on the

    rejection region and the test statistics, which form a “pair”.

  • Introduction Wald test p-values t-test Quiz

    Question 2

    What is not true for the rejection region R?

    Answers:

    1 It is a random variable.

    2 We reject the null hypothesis if the test statistics T is insideof R.

    3 Typically is of the form R = {x : T (x) > c}.4 We make decision on the null hypothesis based on the

    rejection region and the test statistics, which form a “pair”.

  • Introduction Wald test p-values t-test Quiz

    Question 3

    Which of the following statements is not true for the error types inhypothesis?

    Answers:

    1 The type I error is when we reject the null hypothesis altoughit is true.

    2 Type II error is when we retain the null hypothesis although itwas not true.

    3 We make a type II error when we do not reject the nullhypothesis when the alternate is true.

    4 We make a type II error if we accept the null hypothesis and itwas true.

  • Introduction Wald test p-values t-test Quiz

    Question 3

    Which of the following statements is not true for the error types inhypothesis?

    Answers:

    1 The type I error is when we reject the null hypothesis altoughit is true.

    2 Type II error is when we retain the null hypothesis although itwas not true.

    3 We make a type II error when we do not reject the nullhypothesis when the alternate is true.

    4 We make a type II error if we accept the null hypothesis and itwas true.

  • Introduction Wald test p-values t-test Quiz

    Question 4

    What is not true for the Wald test?

    Answers:

    1 The Wald test statistic is W = (θ̂ − θ0)/ŝe and the criticalregion |W | > zα/2.

    2 Pθ0(|W | > zα/2)→ α.3 The data has to be normally distributed to apply it.

    4 An alternative version of the Wald test is W = (θ̂ − θ0)/se0.

  • Introduction Wald test p-values t-test Quiz

    Question 4

    What is not true for the Wald test?

    Answers:

    1 The Wald test statistic is W = (θ̂ − θ0)/ŝe and the criticalregion |W | > zα/2.

    2 Pθ0(|W | > zα/2)→ α.3 The data has to be normally distributed to apply it.

    4 An alternative version of the Wald test is W = (θ̂ − θ0)/se0.

  • Introduction Wald test p-values t-test Quiz

    Question 5

    What is a p value?

    Answers:

    1 The probability of the null hypothesis.

    2 It measures the strength of evidence against H0.

    3 It is equivalent to the first type of error.

    4 It provides a strong evidence in favour of H0.

  • Introduction Wald test p-values t-test Quiz

    Question 5

    What is a p-value?

    Answers:

    1 The probability of the null hypothesis.

    2 It measures the strength of evidence against H0.

    3 It is equivalent to the first type of error.

    4 It provides a strong evidence in favour of H0.

  • Introduction Wald test p-values t-test Quiz

    Question 6

    When does one use a t-test instead of a Wald test?

    Answers:

    1 For modorate sample size.

    2 For small sample size

    3 If the data is not normally distributed.

    4 If the variance of the normal distribution is known.

  • Introduction Wald test p-values t-test Quiz

    Question 6

    When does one use a t-test instead of a Wald test?

    Answers:

    1 For modorate sample size.

    2 For small sample size.

    3 If the data is not normally distributed.

    4 If the variance of the normal distribution is known.

    IntroductionWald testp-valuest-testQuiz