ws 2007/08prof. dr. j. schütze, fb gw ki 1 hypothesis testing statistical tests sometimes you have...

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 1

Fachhochschule JenaUniversity of Applied Sciences Jena

Hypothesis testing

Statistical Tests

Sometimes you have to make a decision about a characteristic of a population.For example you claim a new drug is better treating a disease then a current one. You will observe the change of a symptom of the disease under the new drug and under the standard. And you hope there is a difference between the drugswhich is not only due to chance.

The opposite of your claim will be the null hypothesis, which means theobserved difference is only due to unexplained 'chance' (no effect).

If you can reject the null hypothesis, you will accept the alternative hypothesis:there is a non-chance difference between the drugs (effect).

Accepting the alternative hypothesis (your claim) means that there must be a strong evidence against the null hypothesis.



Example 1

For some branches of industry, it is important to check whether the mean of body height of adults has changed.

If you want to be sure you must sample all German adults and compute the mean of their body height.

A more practicable method is to choose a representative sample of n adults and compute the mean of their body heights.

Thesis:The mean of body height of German adults is 173 cm (from former investigations).You have to decide whether this is still valid or not.

Statistical tests for unknown parameters

This sample mean is an estimation of the unknown mean (expectation) of thebody height of all German adults.Because it does not reflect the whole information of the population there is a risk in your decision.



Null hypothesis: = 173 against Alternative hypothesis: 173

Sample ofsample mean estimates the unknown expectation: 175x

100n

Difference between sample mean and reference value

0 173 175 2d x

Is this sample mean consistent with the Null hypotheses? Or is it so unlikely that the Null hypothesis should be rejected?For this decision we would accept an error probability of 0.05.

Up to which value k is this difference randomly, when will the sample mean be inconsistent with the Null hypothesis (difference too large)?


unknown expectation of the whole population0=173 reference value



In order to infer from the sample mean to the expectation of the population,we use the distribution of the point estimator


210~ ( , ) ( ,1)

100X N N

Using for example a sample size of n = 100,

Then we know the distribution of X2

~ ( , )X Nn

Suppose the random variable X (body height) follows a normal distribution withknown standard deviation = 10 and unknown .

1

1 n

ii

X Xn



If the Null hypothesis is true, = 173, and consequently


Because of it is very unlikely to D taking values of this region if H0 is true, we reject it in this cases.

We find this critical region as

0.975 C D z

~ (0,173)X N 0 173 ~ (0,1)D X X N and hence

This decision has an error probability of 0.05, since also with p = 0.05D can take values of this critical region C even though the Null hypotheses is true.

Using this distribution, we determine a region C for rejecting the H0 by

( ) 0.05P C



Decision in example 1

From the sample, we get d = 2.

Accepting an error probability of 0.05, the critical region is


0.975 1.96 =C D z D

Because of d = 2 belongs to C (it exceeds the critical value of k = 1.96)we reject the null hypothesis.

The mean of the body height of German adults is no longer 173 cm.The confidence level is 0.95 (or: the risk is 0.05).




In practice you do not compute the difference d, you make your decision using the following statistic T

0 ./

XT

n

Under the Null hypothesis, with EX = 0, and also

T ~ N(0,1).

0 ,EX

If the sample value of T is out of this range, the null hypothesis is rejected.

The error probability is then 0.05, because of H0 true, T is out of this range with p= 0.05 too.

Thus with probability 1 -

/ 2 1 / 2 .z T z



0

/

XT

n

In example 1,

= 10, n = 100, 0 173, 175x

Because the sample value T = 2 lies in the critical range, the null hypothesis is rejected.

175 1732

10 / 100

Risk = 0.05

/ 2 1 / 2 1 2 1 / 2( , ) ( , ) ( 1.96, 1.96)z z z z

Under the null hypothesis, T ~ N(0, 1), consequently with p = 0.95 T lies in the interval

The null hypothesis is rejected if T < -1.96 or T > 1.96it is equivalent to |T| > 1.96 (critical range).




Testing scheme

Comparison of the unknown mean under normal distributionwith respect to a reference value ( is assumed to be known )0

0Error of 1. kind : Probability of falsely rejecting a correct H

0 0

1 0

:

:

Null hypothesis

Alternativ hypothesis

Risk

H

H

00

/statistic, under , T follows a standardezed normal distribution

XT H

n

1 / 2 0

1 / 2

Critical range of by risk ,

quantile of N(0.1)

T z H

z




Critical range for risk

Ablehnung Ablehnung

Under Null Hypothesis, T lies in this interval with probability 1-

1 -

Density of statistic Tunder Null hypothesis

Rejection Rejection




Kinds of error

Interpretation in case of H0: there is no difference, no effect

Error of 1. kind: Rejection of a true H0 , false alarm you detect a not existing difference

Error of 2. kind: Accepting an incorrect H0, missing to sound alarm you overlook an existing difference

H0 rejected H0 accepted

H0 correct Error of 1. kind

correct decision

H0 false correct decision Error of 2. kind




One-sided and two-sided tests

Two-sided (two-tailed) test

Null hypothesis: = 0 Alternative hypothesis: 0

One-sided (one-tailed) tests

Null hypothesis 0 Alternative hypothesis: > 0 or

Null hypothesis: 0 Alternative hypothesis: < 0




Errors of 1. und 2. kind

Error of 1. kind: α

Density of T under Null hypothesis

0 0

1 0

:

:H

H

Critical range for one-sidedtest: 1T t

Statistic0

0 ~ (0,1)/

HXT N

n

1t




00 ~ (0,1)

/

HXT N

n

1 0 1t

Errors of 1. und 2. kind


Statistic


0 0:H

Critical range byone-sided test

1T t


Density of T for one-sided alternative

1 0



Error of 2. kind: β

Critical range byone-sided test

1T t

00 ~ (0,1)

/

HXT N

n

1 0 1t

Errors of 1. and 2. kind


Statistic

Density of T for one-sided alternative

1 0


0 0:H




Interpretation

The smaller , the bigger will be.

The probability for the rejection of a false Null hypothesis can be calculated with respect to any alternative reference value 1 when the sample size n is given.

Only increasing n can minimize for a given !

The smaller the difference 0 - 1, the bigger will be (overlapping).




Minimum of sample size for guaranteeing maximal errors , , (² known)

L denotes the practically relevant difference in mean

Comparison with 0(one sample)

Two-sided test: 22

212/1

0

)(

L

zznn

One-sided Test: 2

2

211

0

)(

L

zznn




Testing hypothesis on the mean of a normal distribution (1)

20Comparison to ; known (Gauß-Test)

One-sample tests

Null hypothesis Alternative hypothesis Statistic Critical range

00 : H 01 : H 2/1 zT

00 : H 01 : H 1zT

00 : H 01 : H n

XT

/0

~ N(0, 1) 1zT





00 : H 01 : H 2/1,1 ntT

00 : H 01 : H 1,1ntT

00 : H 01 : H

ns

XT

/0

~ 1nt 1,1ntT


One-sample tests 2

0Comparison of and ; unknown (T-Test)





Note

Testing with a software system results in a p-value (often called significance),which reports the probability you will observe the given sample ore a more extreme one assuming the Null hypothesis were true.

For p-value less then the risk you reject the Null hypothesis, if you test againsta two-sided alternative hypothesis..

In case of one-sided testing take one half of p-value to compare with .

►



One-sample test: The expectation of a population is compared with a given reference value 0.

Two-sample tests can have a paired design (dependent samples) or anunpaired design (independent samples).

Two-sample test: The expectations 1 and 2 of two populations are compared.

unpaired: the samples are obtained in unrelated (disjoint) groups (for example healthy and ill, or female and male)

paired: each data point in one sample is matched to a unique data point in the second sample

(for example pre test/post test design observing twice the same subjects or objects)




Notations

Sample size:

Sample means:

Sample variances:

Pooled variance

, ,x yn n n

1 1

1 1,

yxnn

i ii ix y

X X Y Yn n

2 2 2 2

1 1

2 22

1 1( ) , ( ) ,

1 1

( 1) ( 1)

2

yxnn

x i y ii ix y

x x y yg

x y

s X X s Y Yn n

n s n ss

n n




20 ,Comparison to ; unknown, paired, D x y D X Y D X Y

,

sample mean of differences

sample deviation of

i i i

D i

d x y

d

s d

Null hypothesis Alternative hypothesis Statistic Critical range 0:0 DH 0:1 DH 2/1,1 ntT

0:0 DH 0:1 DH 1,1ntT

0:0 DH 0:1 DH

ns

dT

D

~ 1nt 1,1ntT


Two-sample tests




2 2, ,Comparison to ; unknown, but equal; unpaired

(unpaired T-Test)x y x y X Y


yxH :0 yxH :1 2/1,2 yx nntT

yxH :0 yxH :1 1,2yx nntT

yxH :0 yxH :1

yx

yx

g nn

nn

s

YXT

~ 2 yx nnt 1,2yx nntT

, quantile of the t-distribution with m degrees of freedomm qt q

Testing hypothesis about the mean of a normal distribution (4)

Two-sample tests




2 2, ,Comparison ; unknown, unequal; not paired

(Welch Test)x y x yto X Y


yxH :0 yxH :1 2/1, ftT

yxH :0 yxH :1 1,ftT

yxH :0 yxH :1

y

y

x

x

n

s

ns

YXT22

/)(

~ 1ft 1,ftT

,

2 2 2

2 2 2 2

( / / )

( / ) /( 1) ( / ) /( 1)

q-Quantil of the t-distribution with f degrees of freedom,

(round down always!)

f q

x x y y

x x x y y y

t

s n s nf

s n n s n n

Testing hypothesis about the mean of a normal distribution (5)

Two-sample tests




Comparison X to Y

(Two-sample test)

Two-sided test 22

212/1

0

)(2

L

zznn

One-sided test 2

2

211

0

)(2

L

zznn

Minimum of sample size for guaranteeing maximal errors , , (² known)

L denotes the practically relevant difference between the means




Example: Two-sample test for unpaired samples

Is there a difference in hemoglobin values for healthy children and those suffering a certain illness?Normal distribution with equal variances is provided,error probability 0.05

data for healthy children:data for ill children:

1 19, 18,9 , 5,9n x s 2 213, 11,9 , 6,3n y s


x y Null hypothesis:

0,05 Risk:



Example (continued)

18,9 11,9 9 132,63

9 1337,74T

Because of 2,63 > 2,086, the Null hypothesis is rejected, which means the illnesschanges the mean of hemoglobin level significantly.The result of the sample can be generalized for the whole population with a confidence of 95%

2 2(9 1) 5,9 (13 1) 6,3

9 13 2

37,74

gs

yx

yx

g nn

nn

s

YXT

Statistic:

9 13 2, 0.975 2.086t Criteria of rejection: 2,1 / 2x yn nT t




2 20Comparison to (reference value)

One-sample tests

2,

2quantile of the -distribution with m degrees of freedomm q q

Testing hypothesis about the variance of a normal distribution (1)


Null hypothesis Alternative hypothesis Statistic Critical range2 2

0 0:H 2 2

0 0:H 2 2

0 0:H

2 21 0:H

2 21 0:H

2 21 0:H

022

120

( 1)~H

n

n sT

21,1 / 2n aT 2

1,1n aT 2

1,1 / 2n aT

21, / 2n aT oder



2 2x yComparison to

Two-sample tests

, , : -quantile of the F-distribution with n, m degrees of freedomn m qF q

Null hypothesis Alternativ hypothesis Statistic Critical range 2 2

0 : x yH 2 2

1 : x yH 2/1,1,1 ynxn

FT oder 2/,1,1 ynxn

FT

2 20 : x yH

2 21 : x yH ,1,1 ynxn

FT

2 20 : x yH

2 21 : x yH

22 / yx ssT

0

1, 1~x y

H

n nF 1,1,1 ynxn

FT

This test is used to decide between the unpaired T-test and the Welch-testin order to compare means.

Testing hypothesis about the variance of a normal distribution (1)


►

ws 2007/08prof. dr. j. schütze, fb gw ki 1 hypothesis testing statistical tests sometimes you have...

Documents