ws 2007/08prof. dr. j. schütze, fb gw ki 1 hypothesis testing statistical tests sometimes you have...

30
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 1 Fachhochschule Jena Universityof Applied SciencesJena Hypothesis testing istical Tests times you have to make a decision about a characteristic of a population. example you claim a new drug is better treating a disease then a current will observe the change of a symptom of the disease under the new drug under the standard. And you hope there is a difference between the drugs h is not only due to chance. The opposite of your claim will be the null hypothesis, which means the observed difference is only due to unexplained 'chance' (no effect). ou can reject the null hypothesis, you will accept the alternative hypoth e is a non-chance difference between the drugs (effect). epting the alternative hypothesis (your claim) means that there must be a ong evidence against the null hypothesis.

Upload: vernon-lee

Post on 19-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 1

Fachhochschule JenaUniversity of Applied Sciences Jena

Hypothesis testing

Statistical Tests

Sometimes you have to make a decision about a characteristic of a population.For example you claim a new drug is better treating a disease then a current one. You will observe the change of a symptom of the disease under the new drug and under the standard. And you hope there is a difference between the drugswhich is not only due to chance.

The opposite of your claim will be the null hypothesis, which means theobserved difference is only due to unexplained 'chance' (no effect).

If you can reject the null hypothesis, you will accept the alternative hypothesis:there is a non-chance difference between the drugs (effect).

Accepting the alternative hypothesis (your claim) means that there must be a strong evidence against the null hypothesis.

Page 2: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 2

Fachhochschule JenaUniversity of Applied Sciences Jena

Example 1

For some branches of industry, it is important to check whether the mean of body height of adults has changed.

If you want to be sure you must sample all German adults and compute the mean of their body height.

A more practicable method is to choose a representative sample of n adults and compute the mean of their body heights.

Thesis:The mean of body height of German adults is 173 cm (from former investigations).You have to decide whether this is still valid or not.

Statistical tests for unknown parameters

This sample mean is an estimation of the unknown mean (expectation) of thebody height of all German adults.Because it does not reflect the whole information of the population there is a risk in your decision.

Page 3: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 3

Fachhochschule JenaUniversity of Applied Sciences Jena

Null hypothesis: = 173 against Alternative hypothesis: 173

Sample ofsample mean estimates the unknown expectation: 175x

100n

Difference between sample mean and reference value

0 173 175 2d x

Is this sample mean consistent with the Null hypotheses? Or is it so unlikely that the Null hypothesis should be rejected?For this decision we would accept an error probability of 0.05.

Up to which value k is this difference randomly, when will the sample mean be inconsistent with the Null hypothesis (difference too large)?

Statistical tests for unknown parameters

unknown expectation of the whole population0=173 reference value

Page 4: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 4

Fachhochschule JenaUniversity of Applied Sciences Jena

In order to infer from the sample mean to the expectation of the population,we use the distribution of the point estimator

Statistical tests for unknown parameters

210~ ( , ) ( ,1)

100X N N

Using for example a sample size of n = 100,

Then we know the distribution of X2

~ ( , )X Nn

Suppose the random variable X (body height) follows a normal distribution withknown standard deviation = 10 and unknown .

1

1 n

ii

X Xn

Page 5: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 5

Fachhochschule JenaUniversity of Applied Sciences Jena

If the Null hypothesis is true, = 173, and consequently

Statistical tests for unknown parameters

Because of it is very unlikely to D taking values of this region if H0 is true, we reject it in this cases.

We find this critical region as

0.975 C D z

~ (0,173)X N 0 173 ~ (0,1)D X X N and hence

This decision has an error probability of 0.05, since also with p = 0.05D can take values of this critical region C even though the Null hypotheses is true.

Using this distribution, we determine a region C for rejecting the H0 by

( ) 0.05P C

Page 6: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 6

Fachhochschule JenaUniversity of Applied Sciences Jena

Decision in example 1

From the sample, we get d = 2.

Accepting an error probability of 0.05, the critical region is

Statistical tests for unknown parameters

0.975 1.96 =C D z D

Because of d = 2 belongs to C (it exceeds the critical value of k = 1.96)we reject the null hypothesis.

The mean of the body height of German adults is no longer 173 cm.The confidence level is 0.95 (or: the risk is 0.05).

Page 7: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 7

Fachhochschule JenaUniversity of Applied Sciences Jena

Statistical tests for unknown parameters

In practice you do not compute the difference d, you make your decision using the following statistic T

0 ./

XT

n

Under the Null hypothesis, with EX = 0, and also

T ~ N(0,1).

0 ,EX

If the sample value of T is out of this range, the null hypothesis is rejected.

The error probability is then 0.05, because of H0 true, T is out of this range with p= 0.05 too.

Thus with probability 1 -

/ 2 1 / 2 .z T z

Page 8: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 8

Fachhochschule JenaUniversity of Applied Sciences Jena

0

/

XT

n

In example 1,

= 10, n = 100, 0 173, 175x

Because the sample value T = 2 lies in the critical range, the null hypothesis is rejected.

175 1732

10 / 100

Risk = 0.05

/ 2 1 / 2 1 2 1 / 2( , ) ( , ) ( 1.96, 1.96)z z z z

Under the null hypothesis, T ~ N(0, 1), consequently with p = 0.95 T lies in the interval

The null hypothesis is rejected if T < -1.96 or T > 1.96it is equivalent to |T| > 1.96 (critical range).

Statistical tests for unknown parameters

Page 9: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 9

Fachhochschule JenaUniversity of Applied Sciences Jena

Testing scheme

Comparison of the unknown mean under normal distributionwith respect to a reference value ( is assumed to be known )0

0Error of 1. kind : Probability of falsely rejecting a correct H

0 0

1 0

:

:

Null hypothesis

Alternativ hypothesis

Risk

H

H

00

/statistic, under , T follows a standardezed normal distribution

XT H

n

1 / 2 0

1 / 2

Critical range of by risk ,

quantile of N(0.1)

T z H

z

Statistical tests for unknown parameters

Page 10: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 10

Fachhochschule JenaUniversity of Applied Sciences Jena

Critical range for risk

Ablehnung Ablehnung

Under Null Hypothesis, T lies in this interval with probability 1-

1 -

Density of statistic Tunder Null hypothesis

Rejection Rejection

Statistical tests for unknown parameters

Page 11: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 11

Fachhochschule JenaUniversity of Applied Sciences Jena

Kinds of error

Interpretation in case of H0: there is no difference, no effect

Error of 1. kind: Rejection of a true H0 , false alarm you detect a not existing difference

Error of 2. kind: Accepting an incorrect H0, missing to sound alarm you overlook an existing difference

H0 rejected H0 accepted

H0 correct Error of 1. kind

correct decision

H0 false correct decision Error of 2. kind

Statistical tests for unknown parameters

Page 12: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 12

Fachhochschule JenaUniversity of Applied Sciences Jena

One-sided and two-sided tests

Two-sided (two-tailed) test

Null hypothesis: = 0 Alternative hypothesis: 0

One-sided (one-tailed) tests

Null hypothesis 0 Alternative hypothesis: > 0 or

Null hypothesis: 0 Alternative hypothesis: < 0

Statistical tests for unknown parameters

Page 13: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 13

Fachhochschule JenaUniversity of Applied Sciences Jena

Errors of 1. und 2. kind

Error of 1. kind: α

Density of T under Null hypothesis

0 0

1 0

:

:H

H

Critical range for one-sidedtest: 1T t

Statistic0

0 ~ (0,1)/

HXT N

n

1t

Statistical tests for unknown parameters

Page 14: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 14

Fachhochschule JenaUniversity of Applied Sciences Jena

00 ~ (0,1)

/

HXT N

n

1 0 1t

Errors of 1. und 2. kind

Error of 1. kind: α

Statistic

Density of T under Null hypothesis

0 0:H

Critical range byone-sided test

1T t

Statistical tests for unknown parameters

Density of T for one-sided alternative

1 0

Page 15: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 15

Fachhochschule JenaUniversity of Applied Sciences Jena

Error of 2. kind: β

Critical range byone-sided test

1T t

00 ~ (0,1)

/

HXT N

n

1 0 1t

Errors of 1. and 2. kind

Error of 1. kind: α

Statistic

Density of T for one-sided alternative

1 0

Density of T under Null hypothesis

0 0:H

Statistical tests for unknown parameters

Page 16: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 16

Fachhochschule JenaUniversity of Applied Sciences Jena

Interpretation

The smaller , the bigger will be.

The probability for the rejection of a false Null hypothesis can be calculated with respect to any alternative reference value 1 when the sample size n is given.

Only increasing n can minimize for a given !

The smaller the difference 0 - 1, the bigger will be (overlapping).

Statistical tests for unknown parameters

Page 17: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 17

Fachhochschule JenaUniversity of Applied Sciences Jena

Minimum of sample size for guaranteeing maximal errors , , (² known)

L denotes the practically relevant difference in mean

Comparison with 0(one sample)

Two-sided test: 22

212/1

0

)(

L

zznn

One-sided Test: 2

2

211

0

)(

L

zznn

Statistical tests for unknown parameters

Page 18: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 18

Fachhochschule JenaUniversity of Applied Sciences Jena

Testing hypothesis on the mean of a normal distribution (1)

20Comparison to ; known (Gauß-Test)

One-sample tests

Null hypothesis Alternative hypothesis Statistic Critical range

00 : H 01 : H 2/1 zT

00 : H 01 : H 1zT

00 : H 01 : H n

XT

/0

~ N(0, 1) 1zT

Statistical tests for unknown parameters

Page 19: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 19

Fachhochschule JenaUniversity of Applied Sciences Jena

Null hypothesis Alternative hypothesis Statistic Critical range

00 : H 01 : H 2/1,1 ntT

00 : H 01 : H 1,1ntT

00 : H 01 : H

ns

XT

/0

~ 1nt 1,1ntT

Testing hypothesis on the mean of a normal distribution (2)

One-sample tests 2

0Comparison of and ; unknown (T-Test)

Statistical tests for unknown parameters

Page 20: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 20

Fachhochschule JenaUniversity of Applied Sciences Jena

Statistical tests for unknown parameters

Note

Testing with a software system results in a p-value (often called significance),which reports the probability you will observe the given sample ore a more extreme one assuming the Null hypothesis were true.

For p-value less then the risk you reject the Null hypothesis, if you test againsta two-sided alternative hypothesis..

In case of one-sided testing take one half of p-value to compare with .

Page 21: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 21

Fachhochschule JenaUniversity of Applied Sciences Jena

One-sample test: The expectation of a population is compared with a given reference value 0.

Two-sample tests can have a paired design (dependent samples) or anunpaired design (independent samples).

Two-sample test: The expectations 1 and 2 of two populations are compared.

unpaired: the samples are obtained in unrelated (disjoint) groups (for example healthy and ill, or female and male)

paired: each data point in one sample is matched to a unique data point in the second sample

(for example pre test/post test design observing twice the same subjects or objects)

Statistical tests for unknown parameters

Page 22: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 22

Fachhochschule JenaUniversity of Applied Sciences Jena

Notations

Sample size:

Sample means:

Sample variances:

Pooled variance

, ,x yn n n

1 1

1 1,

yxnn

i ii ix y

X X Y Yn n

2 2 2 2

1 1

2 22

1 1( ) , ( ) ,

1 1

( 1) ( 1)

2

yxnn

x i y ii ix y

x x y yg

x y

s X X s Y Yn n

n s n ss

n n

Statistical tests for unknown parameters

Page 23: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 23

Fachhochschule JenaUniversity of Applied Sciences Jena

20 ,Comparison to ; unknown, paired, D x y D X Y D X Y

,

sample mean of differences

sample deviation of

i i i

D i

d x y

d

s d

Null hypothesis Alternative hypothesis Statistic Critical range 0:0 DH 0:1 DH 2/1,1 ntT

0:0 DH 0:1 DH 1,1ntT

0:0 DH 0:1 DH

ns

dT

D

~ 1nt 1,1ntT

Testing hypothesis on the mean of a normal distribution (3)

Two-sample tests

Statistical tests for unknown parameters

Page 24: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 24

Fachhochschule JenaUniversity of Applied Sciences Jena

2 2, ,Comparison to ; unknown, but equal; unpaired

(unpaired T-Test)x y x y X Y

Null hypothesis Alternative hypothesis Statistic Critical range

yxH :0 yxH :1 2/1,2 yx nntT

yxH :0 yxH :1 1,2yx nntT

yxH :0 yxH :1

yx

yx

g nn

nn

s

YXT

~ 2 yx nnt 1,2yx nntT

, quantile of the t-distribution with m degrees of freedomm qt q

Testing hypothesis about the mean of a normal distribution (4)

Two-sample tests

Statistical tests for unknown parameters

Page 25: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 25

Fachhochschule JenaUniversity of Applied Sciences Jena

2 2, ,Comparison ; unknown, unequal; not paired

(Welch Test)x y x yto X Y

Null hypothesis Alternative hypothesis Statistic Critical range

yxH :0 yxH :1 2/1, ftT

yxH :0 yxH :1 1,ftT

yxH :0 yxH :1

y

y

x

x

n

s

ns

YXT22

/)(

~ 1ft 1,ftT

,

2 2 2

2 2 2 2

( / / )

( / ) /( 1) ( / ) /( 1)

q-Quantil of the t-distribution with f degrees of freedom,

(round down always!)

f q

x x y y

x x x y y y

t

s n s nf

s n n s n n

Testing hypothesis about the mean of a normal distribution (5)

Two-sample tests

Statistical tests for unknown parameters

Page 26: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 26

Fachhochschule JenaUniversity of Applied Sciences Jena

Comparison X to Y

(Two-sample test)

Two-sided test 22

212/1

0

)(2

L

zznn

One-sided test 2

2

211

0

)(2

L

zznn

Minimum of sample size for guaranteeing maximal errors , , (² known)

L denotes the practically relevant difference between the means

Statistical tests for unknown parameters

Page 27: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 27

Fachhochschule JenaUniversity of Applied Sciences Jena

Example: Two-sample test for unpaired samples

Is there a difference in hemoglobin values for healthy children and those suffering a certain illness?Normal distribution with equal variances is provided,error probability 0.05

data for healthy children:data for ill children:

1 19, 18,9 , 5,9n x s 2 213, 11,9 , 6,3n y s

Statistical tests for unknown parameters

x y Null hypothesis:

0,05 Risk:

Page 28: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 28

Fachhochschule JenaUniversity of Applied Sciences Jena

Example (continued)

18,9 11,9 9 132,63

9 1337,74T

Because of 2,63 > 2,086, the Null hypothesis is rejected, which means the illnesschanges the mean of hemoglobin level significantly.The result of the sample can be generalized for the whole population with a confidence of 95%

2 2(9 1) 5,9 (13 1) 6,3

9 13 2

37,74

gs

yx

yx

g nn

nn

s

YXT

Statistic:

9 13 2, 0.975 2.086t Criteria of rejection: 2,1 / 2x yn nT t

Statistical tests for unknown parameters

Page 29: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 29

Fachhochschule JenaUniversity of Applied Sciences Jena

2 20Comparison to (reference value)

One-sample tests

2,

2quantile of the -distribution with m degrees of freedomm q q

Testing hypothesis about the variance of a normal distribution (1)

Statistical tests for unknown parameters

Null hypothesis Alternative hypothesis Statistic Critical range2 2

0 0:H 2 2

0 0:H 2 2

0 0:H

2 21 0:H

2 21 0:H

2 21 0:H

022

120

( 1)~H

n

n sT

21,1 / 2n aT 2

1,1n aT 2

1,1 / 2n aT

21, / 2n aT oder

Page 30: WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population

WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 30

Fachhochschule JenaUniversity of Applied Sciences Jena

2 2x yComparison to

Two-sample tests

, , : -quantile of the F-distribution with n, m degrees of freedomn m qF q

Null hypothesis Alternativ hypothesis Statistic Critical range 2 2

0 : x yH 2 2

1 : x yH 2/1,1,1 ynxn

FT oder 2/,1,1 ynxn

FT

2 20 : x yH

2 21 : x yH ,1,1 ynxn

FT

2 20 : x yH

2 21 : x yH

22 / yx ssT

0

1, 1~x y

H

n nF 1,1,1 ynxn

FT

This test is used to decide between the unpaired T-test and the Welch-testin order to compare means.

Testing hypothesis about the variance of a normal distribution (1)

Statistical tests for unknown parameters