ws 2007/08prof. dr. j. schütze, fb gw ki 1 hypothesis testing statistical tests sometimes you have...
TRANSCRIPT
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 1
Fachhochschule JenaUniversity of Applied Sciences Jena
Hypothesis testing
Statistical Tests
Sometimes you have to make a decision about a characteristic of a population.For example you claim a new drug is better treating a disease then a current one. You will observe the change of a symptom of the disease under the new drug and under the standard. And you hope there is a difference between the drugswhich is not only due to chance.
The opposite of your claim will be the null hypothesis, which means theobserved difference is only due to unexplained 'chance' (no effect).
If you can reject the null hypothesis, you will accept the alternative hypothesis:there is a non-chance difference between the drugs (effect).
Accepting the alternative hypothesis (your claim) means that there must be a strong evidence against the null hypothesis.
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 2
Fachhochschule JenaUniversity of Applied Sciences Jena
Example 1
For some branches of industry, it is important to check whether the mean of body height of adults has changed.
If you want to be sure you must sample all German adults and compute the mean of their body height.
A more practicable method is to choose a representative sample of n adults and compute the mean of their body heights.
Thesis:The mean of body height of German adults is 173 cm (from former investigations).You have to decide whether this is still valid or not.
Statistical tests for unknown parameters
This sample mean is an estimation of the unknown mean (expectation) of thebody height of all German adults.Because it does not reflect the whole information of the population there is a risk in your decision.
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 3
Fachhochschule JenaUniversity of Applied Sciences Jena
Null hypothesis: = 173 against Alternative hypothesis: 173
Sample ofsample mean estimates the unknown expectation: 175x
100n
Difference between sample mean and reference value
0 173 175 2d x
Is this sample mean consistent with the Null hypotheses? Or is it so unlikely that the Null hypothesis should be rejected?For this decision we would accept an error probability of 0.05.
Up to which value k is this difference randomly, when will the sample mean be inconsistent with the Null hypothesis (difference too large)?
Statistical tests for unknown parameters
unknown expectation of the whole population0=173 reference value
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 4
Fachhochschule JenaUniversity of Applied Sciences Jena
In order to infer from the sample mean to the expectation of the population,we use the distribution of the point estimator
Statistical tests for unknown parameters
210~ ( , ) ( ,1)
100X N N
Using for example a sample size of n = 100,
Then we know the distribution of X2
~ ( , )X Nn
Suppose the random variable X (body height) follows a normal distribution withknown standard deviation = 10 and unknown .
1
1 n
ii
X Xn
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 5
Fachhochschule JenaUniversity of Applied Sciences Jena
If the Null hypothesis is true, = 173, and consequently
Statistical tests for unknown parameters
Because of it is very unlikely to D taking values of this region if H0 is true, we reject it in this cases.
We find this critical region as
0.975 C D z
~ (0,173)X N 0 173 ~ (0,1)D X X N and hence
This decision has an error probability of 0.05, since also with p = 0.05D can take values of this critical region C even though the Null hypotheses is true.
Using this distribution, we determine a region C for rejecting the H0 by
( ) 0.05P C
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 6
Fachhochschule JenaUniversity of Applied Sciences Jena
Decision in example 1
From the sample, we get d = 2.
Accepting an error probability of 0.05, the critical region is
Statistical tests for unknown parameters
0.975 1.96 =C D z D
Because of d = 2 belongs to C (it exceeds the critical value of k = 1.96)we reject the null hypothesis.
The mean of the body height of German adults is no longer 173 cm.The confidence level is 0.95 (or: the risk is 0.05).
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 7
Fachhochschule JenaUniversity of Applied Sciences Jena
Statistical tests for unknown parameters
In practice you do not compute the difference d, you make your decision using the following statistic T
0 ./
XT
n
Under the Null hypothesis, with EX = 0, and also
T ~ N(0,1).
0 ,EX
If the sample value of T is out of this range, the null hypothesis is rejected.
The error probability is then 0.05, because of H0 true, T is out of this range with p= 0.05 too.
Thus with probability 1 -
/ 2 1 / 2 .z T z
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 8
Fachhochschule JenaUniversity of Applied Sciences Jena
0
/
XT
n
In example 1,
= 10, n = 100, 0 173, 175x
Because the sample value T = 2 lies in the critical range, the null hypothesis is rejected.
175 1732
10 / 100
Risk = 0.05
/ 2 1 / 2 1 2 1 / 2( , ) ( , ) ( 1.96, 1.96)z z z z
Under the null hypothesis, T ~ N(0, 1), consequently with p = 0.95 T lies in the interval
The null hypothesis is rejected if T < -1.96 or T > 1.96it is equivalent to |T| > 1.96 (critical range).
Statistical tests for unknown parameters
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 9
Fachhochschule JenaUniversity of Applied Sciences Jena
Testing scheme
Comparison of the unknown mean under normal distributionwith respect to a reference value ( is assumed to be known )0
0Error of 1. kind : Probability of falsely rejecting a correct H
0 0
1 0
:
:
Null hypothesis
Alternativ hypothesis
Risk
H
H
00
/statistic, under , T follows a standardezed normal distribution
XT H
n
1 / 2 0
1 / 2
Critical range of by risk ,
quantile of N(0.1)
T z H
z
Statistical tests for unknown parameters
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 10
Fachhochschule JenaUniversity of Applied Sciences Jena
Critical range for risk
Ablehnung Ablehnung
Under Null Hypothesis, T lies in this interval with probability 1-
1 -
Density of statistic Tunder Null hypothesis
Rejection Rejection
Statistical tests for unknown parameters
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 11
Fachhochschule JenaUniversity of Applied Sciences Jena
Kinds of error
Interpretation in case of H0: there is no difference, no effect
Error of 1. kind: Rejection of a true H0 , false alarm you detect a not existing difference
Error of 2. kind: Accepting an incorrect H0, missing to sound alarm you overlook an existing difference
H0 rejected H0 accepted
H0 correct Error of 1. kind
correct decision
H0 false correct decision Error of 2. kind
Statistical tests for unknown parameters
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 12
Fachhochschule JenaUniversity of Applied Sciences Jena
One-sided and two-sided tests
Two-sided (two-tailed) test
Null hypothesis: = 0 Alternative hypothesis: 0
One-sided (one-tailed) tests
Null hypothesis 0 Alternative hypothesis: > 0 or
Null hypothesis: 0 Alternative hypothesis: < 0
Statistical tests for unknown parameters
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 13
Fachhochschule JenaUniversity of Applied Sciences Jena
Errors of 1. und 2. kind
Error of 1. kind: α
Density of T under Null hypothesis
0 0
1 0
:
:H
H
Critical range for one-sidedtest: 1T t
Statistic0
0 ~ (0,1)/
HXT N
n
1t
Statistical tests for unknown parameters
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 14
Fachhochschule JenaUniversity of Applied Sciences Jena
00 ~ (0,1)
/
HXT N
n
1 0 1t
Errors of 1. und 2. kind
Error of 1. kind: α
Statistic
Density of T under Null hypothesis
0 0:H
Critical range byone-sided test
1T t
Statistical tests for unknown parameters
Density of T for one-sided alternative
1 0
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 15
Fachhochschule JenaUniversity of Applied Sciences Jena
Error of 2. kind: β
Critical range byone-sided test
1T t
00 ~ (0,1)
/
HXT N
n
1 0 1t
Errors of 1. and 2. kind
Error of 1. kind: α
Statistic
Density of T for one-sided alternative
1 0
Density of T under Null hypothesis
0 0:H
Statistical tests for unknown parameters
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 16
Fachhochschule JenaUniversity of Applied Sciences Jena
Interpretation
The smaller , the bigger will be.
The probability for the rejection of a false Null hypothesis can be calculated with respect to any alternative reference value 1 when the sample size n is given.
Only increasing n can minimize for a given !
The smaller the difference 0 - 1, the bigger will be (overlapping).
Statistical tests for unknown parameters
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 17
Fachhochschule JenaUniversity of Applied Sciences Jena
Minimum of sample size for guaranteeing maximal errors , , (² known)
L denotes the practically relevant difference in mean
Comparison with 0(one sample)
Two-sided test: 22
212/1
0
)(
L
zznn
One-sided Test: 2
2
211
0
)(
L
zznn
Statistical tests for unknown parameters
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 18
Fachhochschule JenaUniversity of Applied Sciences Jena
Testing hypothesis on the mean of a normal distribution (1)
20Comparison to ; known (Gauß-Test)
One-sample tests
Null hypothesis Alternative hypothesis Statistic Critical range
00 : H 01 : H 2/1 zT
00 : H 01 : H 1zT
00 : H 01 : H n
XT
/0
~ N(0, 1) 1zT
Statistical tests for unknown parameters
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 19
Fachhochschule JenaUniversity of Applied Sciences Jena
Null hypothesis Alternative hypothesis Statistic Critical range
00 : H 01 : H 2/1,1 ntT
00 : H 01 : H 1,1ntT
00 : H 01 : H
ns
XT
/0
~ 1nt 1,1ntT
Testing hypothesis on the mean of a normal distribution (2)
One-sample tests 2
0Comparison of and ; unknown (T-Test)
Statistical tests for unknown parameters
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 20
Fachhochschule JenaUniversity of Applied Sciences Jena
Statistical tests for unknown parameters
Note
Testing with a software system results in a p-value (often called significance),which reports the probability you will observe the given sample ore a more extreme one assuming the Null hypothesis were true.
For p-value less then the risk you reject the Null hypothesis, if you test againsta two-sided alternative hypothesis..
In case of one-sided testing take one half of p-value to compare with .
►
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 21
Fachhochschule JenaUniversity of Applied Sciences Jena
One-sample test: The expectation of a population is compared with a given reference value 0.
Two-sample tests can have a paired design (dependent samples) or anunpaired design (independent samples).
Two-sample test: The expectations 1 and 2 of two populations are compared.
unpaired: the samples are obtained in unrelated (disjoint) groups (for example healthy and ill, or female and male)
paired: each data point in one sample is matched to a unique data point in the second sample
(for example pre test/post test design observing twice the same subjects or objects)
Statistical tests for unknown parameters
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 22
Fachhochschule JenaUniversity of Applied Sciences Jena
Notations
Sample size:
Sample means:
Sample variances:
Pooled variance
, ,x yn n n
1 1
1 1,
yxnn
i ii ix y
X X Y Yn n
2 2 2 2
1 1
2 22
1 1( ) , ( ) ,
1 1
( 1) ( 1)
2
yxnn
x i y ii ix y
x x y yg
x y
s X X s Y Yn n
n s n ss
n n
Statistical tests for unknown parameters
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 23
Fachhochschule JenaUniversity of Applied Sciences Jena
20 ,Comparison to ; unknown, paired, D x y D X Y D X Y
,
sample mean of differences
sample deviation of
i i i
D i
d x y
d
s d
Null hypothesis Alternative hypothesis Statistic Critical range 0:0 DH 0:1 DH 2/1,1 ntT
0:0 DH 0:1 DH 1,1ntT
0:0 DH 0:1 DH
ns
dT
D
~ 1nt 1,1ntT
Testing hypothesis on the mean of a normal distribution (3)
Two-sample tests
Statistical tests for unknown parameters
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 24
Fachhochschule JenaUniversity of Applied Sciences Jena
2 2, ,Comparison to ; unknown, but equal; unpaired
(unpaired T-Test)x y x y X Y
Null hypothesis Alternative hypothesis Statistic Critical range
yxH :0 yxH :1 2/1,2 yx nntT
yxH :0 yxH :1 1,2yx nntT
yxH :0 yxH :1
yx
yx
g nn
nn
s
YXT
~ 2 yx nnt 1,2yx nntT
, quantile of the t-distribution with m degrees of freedomm qt q
Testing hypothesis about the mean of a normal distribution (4)
Two-sample tests
Statistical tests for unknown parameters
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 25
Fachhochschule JenaUniversity of Applied Sciences Jena
2 2, ,Comparison ; unknown, unequal; not paired
(Welch Test)x y x yto X Y
Null hypothesis Alternative hypothesis Statistic Critical range
yxH :0 yxH :1 2/1, ftT
yxH :0 yxH :1 1,ftT
yxH :0 yxH :1
y
y
x
x
n
s
ns
YXT22
/)(
~ 1ft 1,ftT
,
2 2 2
2 2 2 2
( / / )
( / ) /( 1) ( / ) /( 1)
q-Quantil of the t-distribution with f degrees of freedom,
(round down always!)
f q
x x y y
x x x y y y
t
s n s nf
s n n s n n
Testing hypothesis about the mean of a normal distribution (5)
Two-sample tests
Statistical tests for unknown parameters
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 26
Fachhochschule JenaUniversity of Applied Sciences Jena
Comparison X to Y
(Two-sample test)
Two-sided test 22
212/1
0
)(2
L
zznn
One-sided test 2
2
211
0
)(2
L
zznn
Minimum of sample size for guaranteeing maximal errors , , (² known)
L denotes the practically relevant difference between the means
Statistical tests for unknown parameters
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 27
Fachhochschule JenaUniversity of Applied Sciences Jena
Example: Two-sample test for unpaired samples
Is there a difference in hemoglobin values for healthy children and those suffering a certain illness?Normal distribution with equal variances is provided,error probability 0.05
data for healthy children:data for ill children:
1 19, 18,9 , 5,9n x s 2 213, 11,9 , 6,3n y s
Statistical tests for unknown parameters
x y Null hypothesis:
0,05 Risk:
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 28
Fachhochschule JenaUniversity of Applied Sciences Jena
Example (continued)
18,9 11,9 9 132,63
9 1337,74T
Because of 2,63 > 2,086, the Null hypothesis is rejected, which means the illnesschanges the mean of hemoglobin level significantly.The result of the sample can be generalized for the whole population with a confidence of 95%
2 2(9 1) 5,9 (13 1) 6,3
9 13 2
37,74
gs
yx
yx
g nn
nn
s
YXT
Statistic:
9 13 2, 0.975 2.086t Criteria of rejection: 2,1 / 2x yn nT t
Statistical tests for unknown parameters
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 29
Fachhochschule JenaUniversity of Applied Sciences Jena
2 20Comparison to (reference value)
One-sample tests
2,
2quantile of the -distribution with m degrees of freedomm q q
Testing hypothesis about the variance of a normal distribution (1)
Statistical tests for unknown parameters
Null hypothesis Alternative hypothesis Statistic Critical range2 2
0 0:H 2 2
0 0:H 2 2
0 0:H
2 21 0:H
2 21 0:H
2 21 0:H
022
120
( 1)~H
n
n sT
21,1 / 2n aT 2
1,1n aT 2
1,1 / 2n aT
21, / 2n aT oder
WS 2007/08 Prof. Dr. J. Schütze, FB GW KI 30
Fachhochschule JenaUniversity of Applied Sciences Jena
2 2x yComparison to
Two-sample tests
, , : -quantile of the F-distribution with n, m degrees of freedomn m qF q
Null hypothesis Alternativ hypothesis Statistic Critical range 2 2
0 : x yH 2 2
1 : x yH 2/1,1,1 ynxn
FT oder 2/,1,1 ynxn
FT
2 20 : x yH
2 21 : x yH ,1,1 ynxn
FT
2 20 : x yH
2 21 : x yH
22 / yx ssT
0
1, 1~x y
H
n nF 1,1,1 ynxn
FT
This test is used to decide between the unpaired T-test and the Welch-testin order to compare means.
Testing hypothesis about the variance of a normal distribution (1)
Statistical tests for unknown parameters
►