parametric and non-parametric statistical methods for … · why nonparametric methodswhat test to...

Why nonparametric methods What test to use ? Rank Tests

Parametric and non-parametric statistical methodsfor the life sciences - Session I

Liesbeth Bruckers Geert Molenberghs

Interuniversity Institute for Biostatistics and statisticalBioinformatics (I-Biostat)

Universiteit Hasselt

June 7, 2011

June 6, 2011Doctoral School Medicine

Why nonparametric methods What test to use ? Rank Tests

Table of contents

1 Why nonparametric methodsIntroductory exampleNonparametric test of hypotheses

2 What test to use ?Two independent samplesMore then two independent samplesTwo dependent samplesMore then two dependent samplesOrdered hypotheses

3 Rank TestsWilcoxon Rank Sum TestKruskal-Wallis TestFriedmann StatisticSign TestJonckheere-Terpstra Test

Doctoral School Medicine

Why nonparametric methods What test to use ? Rank Tests Introductory example Nonparametric test of hypotheses

Why nonparametric methods ?



Introductory Example

The paper Hypertension in Terminal Renal Failure, ObservationsPre and Post Bilateral Nephrectomy (J. Chronic Diseases (1973):471-501) gave blood pressure readings for five terminal renalpatients before and 2 months after surgery (removal of kidney).

Patient 1 2 3 4 5Before surgery 107 102 95 106 112After surgery 87 97 101 113 80

Question: Does the mean blood pressure before surgery exceed the

mean blood pressure two months after surgery ?



Classical Approach

Paired t-test:

Patient 1 2 3 4 5Before surgery 107 102 95 106 112After surgery 87 97 101 113 80Difference Di 20 5 -6 -7 32

Hypotheses: H0 : µd = 0 versus H1 : µd > 0

µd : mean difference in blood pressure

Test-Statistic : t = D√1

n(n−1)

∑(Di−D)2

follows a t distribution with n − 1 d.f.



Assumptions

The statistic follows a t-distribution if the differences arenormally distributed ⇒ t-test = parametric method

Observations are made independent: selection of a patientdoes not influence chance of any other patient for inclusion

(Two sample t test): populations must have same variances

Variables must be measured in an interval scale, to interpretthe results

These assumptions are often not tested, but accepted.



Normal probability plot

Normality is questionable !



Nonparametric Test of Hypotheses

Follow same general procedure as parametric tests:

State null and alternative hypothesis

Calculate the value of the appropriate test statistic (choicebased on the design of the study)

Decision rule: either reject or accept depending on themagnitude of the statistic

PH0 (T ≥ c) = ??Exact distributionApproximation for the exact distribution


Why nonparametric methods What test to use ? Rank Tests Two independent samples More then two independent samples Two dependent samples More then two dependent samples Ordered hypotheses

When to use what test



What test to use ?

Choice of appropriate test statistic depends on the design of thestudy:

number of groups ?

independent of dependent samples ?

ordered alternative hypothesis ?



Two Independent Samples

Permeability constants of the human chorioamnion (a placentalmembrane) for at term (x) and between 12 to 26 weeks gestationalage (y) pregnancies are given in the table below. Investigate thealternative of interest that the permeability of the humanchorioamnion for a term pregnancy is greater than for a 12 to 26weeks of gestational age pregnancy.

X (at term) 0.83 1.89 1.04 1.45 1.38 1.91 1.64 1.46Y (12-26weeks) 1.15 0.88 0.90 0.74 1.21

Statistical Methods:

t-test

Wilcoxon Rank Sum Test



More Than Two Independent Samples

Protoporphyrin levels were determined for three groups of people -a control group of normal workers, a group of alcoholics withsideroblasts in their bone marrow, and a group of alcoholicswithout sideroblasts. The data is shown below. Does the datasuggest that normal workers and alcoholics with and withoutsideroblasts differ with respect to protoporphyrin level ?

Group Protoporphyrin level (mg)Normal 22 27 47 30 38 78 28 58 72 56Alcoholics with sideroblasts 78 172 286 82 453 513 174 915 84 153Alcoholics without sideroblasts 37 28 38 45 47 29 34 20 68 12


ANOVA

Kruskal-Wallis Test



Two Dependent Samples

Twelve adult males were put on liquid diet in a weight-reducingplan. Weights were recorded before and after the diet. The dataare shown in the table below.

Subject 1 2 3 4 5 6 7 8 9 10 11 12Before 186 171 177 168 191 172 177 191 170 171 188 187After 188 177 176 169 196 172 165 190 165 180 181 172


Paired t-test

Sign test; Signed-rank test



Randomized Blocked Design

Effect of Hypnosis:

Emotions of fear, happiness, depression and calmness wererequested (in random order) from 8 subject during hypnosis

Response: skin potential (in millivolts)

Subject 1 2 3 4 5 6 7 8

Fear 23.1 57.6 10.5 23.6 11.9 54.6 21.0 20.3Happiness 22.7 53.2 9.7 19.6 13.8 47.1 13.6 23.6Depression 22.5 53.7 10.8 21.1 13.7 39.2 13.7 16.3Calmness 22.6 53.1 8.3 21.6 13.3 37.0 14.8 14.8


Mixed Models

Friedmann test



Ordered Treatments

Patients were treated with a drug a four dose levels (100mg,200mg, 300mg and 400mg) and then monitored for toxicity.

Drug ToxicityDose Mild Moderate Severe Drug Death100mg 100 1 0 0200mg 18 1 1 0300mg 50 1 1 0400mg 50 1 1 1


Regression

Jonckheere-Terpstra Test


Why nonparametric methods What test to use ? Rank Tests Wilcoxon Rank Sum Test Kruskal-Wallis Test Friedmann Statistic Sign Test Jonckheere-Terpstra Test

Wilcoxon Rank Sum Test



Wilxocon Rank Sum Test

Detailed Example:

Data : GAF scores

Control 25 10 35Treatment 36 26 40

Does treatment improve the functioning ?



Parametric Approach: t-test

t = X̄1−X̄0SX1−X0

, where SX1−X0=

√s21n1

+s20n0

t test: means of two normally distributed populations areequal

H0 : µ1 = µ0

H1 : µ1 6= µ0 (one sided test H1 : µ1 ≥ µ0

equal sample sizes

two distributions have the same variance

X̄1 = 34.00, X̄0 = 23.33, SX1 = 7.21,SX0 = 12.58

t = 1.27

PH0(t ≥ 1.27) = 0.1358



Wilxocon Rank Sum Test

Detailed Example:

Control 25 10 35Treatment 36 26 40

Order data: Position of patients on treatment as comparedwith position of patients in control arm ?

Ranks



Treatment is effective if treated patients rank sufficientlyhigh in the combined ranking of all patients

Test statistic such that:

treatment ranks are high ⇔ value test statistic is hightreatment ranks are low ⇔ value test statistic is low

WS = S1 + S2 + . . .+ Sn (n=3, number of patients in treatment arm)

Ranks

Control 2 1 4(25) (10) (35)

Treatment 5 3 6(36) (26) (40)

WS = 5+3+6 =14



Reject null hypothesis when WS is sufficiently large : WS ≥ c

PH0(WS ≥ c) = α (alpha=0.05)

Distribution of WS under H0 ?

Suppose no treatment effect (H0)

rank is solely determined by patients health statusrank is independent of receiving treatment or placebo“rank is assigned to patient before randomisation”

Random selection of patients for treatment ⇒ randomselection of 3 ranks out of 6

Randomisation divides ranks (1,2,...6) into two groups !

Number of possible combinations :(Nn

)= N!

n!(N−n)!



All posibilities: (each as a probability of 1/20 under H0)

treatment ranks (4,5,6) (3,5,6) (3,4,6) (3,4,5) (2,5,6)ws 15 14 13 12 13treatment ranks (2,4,6) (2,4,5) (2,3,6) (2,3,5) (2,3,4)w 12 11 11 10 9treatment ranks (1,5,6) (1,4,6) (1,4,5) (1,3,6) (1,3,5)ws 12 11 10 10 9treatment ranks (1,3,4) (1,2,6) (1,2,5) (1,2,4) (1,2,3)ws 8 9 8 7 6



Distribution of WS under the null hypothesis:

w 6 7 8 9 10 11 12 13 14 15

PH0(Ws = w) 1

201

202

203

203

203

203

202

201

201

20



PHO(WS ≥ 14) = 0.1

Do not reject H0.

Conclusion: Treatment does not increase the GAF scores.

Power of this study ???



Large Sample Size-case

(Nn

)increases rapidly with N and n(

2010

)= 184756(

126

)= 924

Asymptotic Null Distribution: Central Limit Theorem

Sum T of large number of independent random variables isapproximately normally distributed.

P

(T − E (T )√

Var(T )≤ a

)≈ Φ(a)

where Φ(a) is the area to the left of a under a standard normal curve



If both n and m are sufficiently large:

WS ≈ N(E (WS);√Var(WS))

E (WS) = 12n(N + 1)

Var(WS) = 112nm(N + 1)



Kruskal-Wallis Test



Kruskal- Wallis test

Example: Kruskal- Wallis test:

The following data represent corn yields per acre from threedifferent fields where different farming methods were used.

Method 1 Method 2 Method 3

92 94 10191 90 10084 81 9389 102

Question: is the yields different for the 4 methods ?



Parametric Approach One-way ANOVA

Statistical test of whether or not the means of several groupsare all equal

Assumptions:

Independence of casesThe distributions of the residuals are normal : εi ∼ (0, σ2).Homoscedasticity

F = variance between groupsvariance within groups = MSTR

MSE

Statistic follows a F distribution with s − 1, n − s d.f.



Small F:

Large F:



One-Way ANOVA results

X̄1 = 89, X̄2 = 88.33, X̄3 = 99

σ1 = 3.56, σ2 = 6.65, σ3 = 4.08

MSTR= 135.03 , MSE = 22.08

F= 6.11

PH0(F ≥ 6.11) = 0.0245



Ranks:

Method 1 Method 2 Method 3

6 8 105 4 91 2 73 11

Ri .: 3.75 4.666 6.75



Hypothesis :H0: No difference between the treatmentsH1: Any difference between the treatments

If treatments do not differ widely (H0):

Ri. are close to each otherRi. close to R..

If treatments do differ (H1):

Ri. differ substantialRi. not close to R..



Evaluate the null hypothesis by investigating:

K =12

N(N + 1)

s∑i=1

ni (Ri . − R..)2

PH0(K ≥ c) = ?

Exact distribution of K under H0 :

ranks are determined before assignment to treatmentrandom assignment → all possibilities same chance of beingobserved

Number of possible combinations: multinomial coefficient :( 114,3,4

)=(11

4

)(73

)(44

)= 11550( N

n1,n2,...,ns

)=(Nn1

)(N−n1n2

). . .(N−n1−...−ns−1

ns

)Doctoral School Medicine


A few possible configurations:

Method 1 Method 2 Method 3 K

(1,2,3,4) (5,6,7) (8,9,10,11) 8.91(1,2,3,5) (4,6,7) (8,9,10,11) 8.32(1,2,3,6) (4,5,6) (8,9,10,11) 7.84(1,2,3,7) (4,5,6) (8,9,10,11) 7,48

. . .(1,3,5,6) (2,4,8) (7,9,10,11) 6.16

. . .

Each configuration has a probability of 111550 to happen.



Exact Distribution of K :

PH0(K ≥ 6.16) = 0.0306

Conclusion: Reject H0: there is a difference between thefarming methods

Large sample size approximation ” χ2 distribution with s − 1d.f.



Friedmann Test



Friedmann Statistic

Setting 1: complete randomization:Kruskal-Wallis test p-value =0.8611Treatment effect is blurred by the variability between subjects

Setting 2: randomisation within age groups:p-value 0.0411Conclusion reject H0



Procedure

Divide subjects in homogeneous subgroups (BLOCKS)

Compare subjects within the blocks w.r.t. treatment effects

(Generalisation of the paired comparison design)



Example

DataAge-group

treatment 20-30 y 30-40 y 40-50 y 50-60 yA 19 21 43 46B 17 20 37 44C 23 22 39 42

Rank subjects within a block:Age-group

treatment 20-30 y 30-40 y 40-50 y 50-60 yA 2 2 3 3B 1 1 1 2C 3 3 2 1



Mean of ranks for:

treatment A = RA.=104 = 2.5

treatment B = RB.=64 = 1.5

treatment C = RC .=94 = 2.25

If these mean ranks are different → reject H0

If these mean ranks are close → accept H0



Measure for closseness of the mean ranks:if the Ri . are all close to each other

↓then they are close to the overall mean R..

and(Ri . − R..)

2 will be close to zero

Friedman Statistic

Q =12N

s(s + 1)

s∑i=1

(Ri . − R..)2



PH0(Q ≥ c) =?

Exact distribution of Q under H0:

A few possible configurations:Age-group Q

Treatment 20-30 y 30-40 y 40-50 y 50-60 yA 1 1 1 1 8B 2 2 2 2C 3 3 3 3A 3 3 3 3 8B 2 2 2 2C 1 1 1 1A 1 3 1 3 0B 2 2 2 2C 3 1 3 1. . .A 2 2 3 3 3.5B 1 1 1 2C 3 3 2 1



Exact Distribution of Q:

Q Pr—————————————-.0000000 .694444444444444E-01

.5000000 .277777777777778

1.500000 .222222222222222

2.000000 .157407407407407

3.500000 .148148148148148

4.500000 .555555555555555E-01

6.000000 .277777777777778E-01

6.500000 .370370370370370E-01

8.000000 .462962962962963E-02



Number of possibilities for the rank combinations:

age-group 20- 30 year: 3! = 6age-groups are independent

↓total number of possible combinations: (3!)4 = 1296

Under the null these are all equally likely : 11296

(s!)N , s=] treatment groups, N = ] of blocks

PH0(Q ≥ 3.5) = 0.2731

Do not reject H0



Sign Test



Sign Test

Special case of Friedmann test: blocks of size 2

subjects matched on e.g. age, gender, ...twinstwo eyes (hands) of a personsubject serves as own control: e.g. blood pressure before and after treatment

Example: Pain scores for lower back pain, before and afterhaving acupuncture

Pain score Pain score Sign Pain score Pain score SignPatient Before After Patient Before After1 5 6 - 8 7 6 +2 6 7 - 9 6 5 +3 7 6 + 10 5 7 -4 9 4 + 11 8 6 +5 6 7 - 12 8 4 +6 5 4 + 13 7 3 +7 4 8 - 14 8 5 +

15 6 7 -



9 pairs out 15 where treatment comes out ahead (reduction in

pain scores)

Sign Test: SN = 9

PH0(SN ≥ 9) =???

Exact Distribution of SN under H0 is binomial

N trials, N = number of ‘pairs’Success probability: 1

2

PH0(SN = a) =

(N

a

)1

2N

PH0(SN ≥ 9) = ((15

9

)+(15

10

)+ . . .+

(1515

)) 1

215 = 0.31




To be used when the H1 is ordered.

Ordinal data for the responses and an ordering in thetreatment/groups.

Example:

Data:

Three diets for ratsResponse: growthH1: Growth rate decreases from A to C : A ≥ B ≥ C

A 133 139 149 160 184B 111 125 143 148 157C 99 114 116 127 146



Parametric Approach : Regression

Models the relationship between a dependent and independentvariable

yi = β0 + β1xi + εiAssumptions

εi ∼ N(0, σ2), εi are independenthomoscedasticityxi is measured without error



β0 = 169, p-value = < 0.0001

β1 = −16, p-value = 0.0133

R-square = 0.3866




Based on Mann-Whitney statistics for two treatments

Comparing the treatment groups two by twoif WBA is large: growth A > growth B : (WBA= 18

if WBC is large: growth B > growth C : (WBC = 18

if WCA is large: growth A > growth C : (WBA= 23

JT Statistic: W =∑

i<j Wij

Reject H0 when W is sufficiently large

W = 59

PH0(W ≥ c) = 0.0120

Compare with the result of a Kruskal-Wallis Test: p-value =0. 072

The distribution of W follows a normal distribution for largesamples



Parametric versus nonparametric tests

Parametric tests:

Assumptions about the distribution in the population

Conditions are often not tested

Test depends on the validity of the assumptions

Most powerful test if all assumptions are met

Nonparametric tests:

Fewer assumptions about the distribution in the population

In case of small sample sizes often the only alternative (unless the

nature of the population distribution is known exactly)

Less sensitive for measurement error (uses ranks)

Can be used for data which are inherently in ranks, even fordata measured in a nominal scale

Easier to learn


parametric and non-parametric statistical methods for … · why nonparametric methodswhat test to...

Documents