non-parametric tests - main <...
TRANSCRIPT
Non-parametric tests
Non-parametric tests
B.H. Robbins Scholars Series
June 24, 2010
1 / 30
Non-parametric tests
Outline
One Sample Test: Wilcoxon Signed-Rank
Two Sample Test: Wilcoxon–Mann–Whitney
Confidence Intervals
Summary
2 / 30
Non-parametric tests
IntroductionI T-tests: tests for the means of continuous data
I One sample H0 : µ = µ0 versus HA : µ 6= µ0
I Two sample H0 : µ1 − µ2 = 0 versus HA : µ1 − µ2 6= 0
I Underlying these tests is the assumption that the data arisefrom a normal distribution
I T-tests do not actually require normally distributed data toperform reasonably well in most circumstances
I Parametric methods: assume the data arise from adistribution described by a few parameters (Normaldistribution with mean µ and variance σ2).
I Nonparametric methods: do not make parametric assumptions(most often based on ranks as opposed to raw values)
I We discuss non-parametric alternatives to the one and twosample t-tests.
3 / 30
Non-parametric tests
Examples of when the parametric t-test goes wrong
I Extreme outliersI Example: t-test comparing two sets of measurements
I Sample 1: 1 2 3 4 5 6 7 8 9 10I Sample 2: 7 8 9 10 11 12 13 14 15 16 17 18 19 20
I Sample averages: 5.5 and 13.5, T-test p-value p = 0.000019I Example: t-test comparing two sets of measurements
I Sample 1: 1 2 3 4 5 6 7 8 9 10I Sample 2: 7 8 9 10 11 12 13 14 15 16 17 18 19 20 200
I Sample averages: 5.5 and 25.9, T-test p-value p = 0.12
4 / 30
Non-parametric tests
Examples of when the parametric t-test goes wrong
I T-statistic
t =x1 − x2
s√
1n1
+ 1n2
I For two sample tests
s2 =(n1 − 1)s2
1 + (n2 − 1)s22
n1 + n2 − 2
I In the first datasetI s2
1 = 9.2, s22 = 17.5
I In the second datasetI s2
1 = 9.2, s22 = 2335
5 / 30
Non-parametric tests
Examples of when the parametric t-test goes wrong
I Upper detection limitsI Example: Fecal calprotectin was being evaluated as a possible
biomarker of Crohn’s disease severityI Median can be calculated (mean cannot)
6 / 30
Non-parametric tests
Feca
l Cal
prot
ectin
No or Mild Activity Moderate or Severe Activity
0
500
1000
1500
2000
2500
n = 8 n = 18
Above Detection Limit
7 / 30
Non-parametric tests
When to use non-parametric methods
I With correct assumptions (e.g., normal distribution),parametric methods will be more efficient / powerful thannon-parametric methods but often not as much as you mightthink1
I If the normality assumption grossly violated, nonparametrictests can be much more efficient and powerful than thecorresponding parametric test
I Non-parametric methods provide a well-foundationed way todeal with circumstance in which parametric methods performpoorly.
The large-sample efficiency of the Wilcoxon test compared to the t test is 3π
=0.9549.
8 / 30
Non-parametric tests
Non-parametric methods
I Many non-parametric methods convert raw values to ranksand then analyze ranks
I In case of ties, midranks are used, e.g., if the raw data were105 120 120 121 the ranks would be 1 2.5 2.5 4
Parametric Test Nonparametric Counterpart
1-sample t Wilcoxon signed-rank2-sample t Wilcoxon 2-sample rank-sumk-sample ANOVA Kruskal-WallisPearson r Spearman ρ
9 / 30
Non-parametric tests
One Sample Test: Wilcoxon Signed-Rank
One sample tests
I Non-parametric analogue to the one sample t-test.
I Almost always used on paired data where the column ofvalues represents differences (e.g., D = Ypost − Ypre).
I Sign test: the simplest test for the median difference beingzero in the population
I Examine all values of D after discarding those in which D=0I Count the number of positive DsI Tests H0 :Prob[D > 0] = 1
2 versus HA :Prob[D > 0] 6= 12
I Under H0 it is equally likely in the population to have a valuebelow zero as it is to have a value above zero
I Note that it ignores magnitudes completely → it is inefficient(low power)
10 / 30
Non-parametric tests
One Sample Test: Wilcoxon Signed-Rank
One sample tests: Wilcoxon signed rank
I In the pre-post analysisI D = pre - postI Retain the sign of D ( +/-)I Rank = rank of |D| (absolute value of D)I Signed rank, SR = Sign * RankI Base analyses on SR
I Observations with zero differences are ignored
I Example: A pre-post study
Post Pre D Sign Rank of |D| Signed Rank
3.5 4 0.5 + 1.5 1.54.5 4 -0.5 - 1.5 -1.54 5 1.0 + 4.0 4.0
3.9 4.6 0.7 + 3.0 3.0
11 / 30
Non-parametric tests
One Sample Test: Wilcoxon Signed-Rank
One sample tests
I A good approximation to an exact P-value (not discussed)may be obtained by computing
z =
∑SRi√∑SR2
i
,
where the signed rank for observation i is SRi .
I We can then compare |z | to the normal distribution.
I Here, z = 7√29.5
= 1.29 and by surfstat the 2-tailed P-value
is 0.197I If all differences are positive or all are negative, the exact
2-tailed P-value is 12n−1
I This implies that n must exceed 5 for any possibility ofsignificance at the α = 0.05 level for a 2-tailed test
12 / 30
Non-parametric tests
One Sample Test: Wilcoxon Signed-Rank
One sample tests
I Sleep DatasetI Compare the effects of two soporific drugs.I Each subject receives Drug 1 and Drug 2I Study question: Is Drug 1 or Drug 2 more effective at
increasing sleep?I Dependent variable: Difference in hours of sleep comparing
Drug 2 to Drug 1I H0 : For any given subject, the difference in hours of sleep is
equally likely to be positive or negative
13 / 30
Non-parametric tests
One Sample Test: Wilcoxon Signed-Rank
Subject Drug 1 Drug 2 Diff (2-1) Sign Rank
1 1.9 0.7 −1.2 - 32 −1.6 0.8 2.4 + 83 −0.2 1.1 1.3 + 4.54 −1.2 0.1 1.3 + 4.55 −0.1 −0.1 0.0 NA NA6 3.4 4.4 1.0 + 27 3.7 5.5 1.8 + 78 0.8 1.6 0.8 + 19 0.0 4.6 4.6 + 910 2.0 3.4 1.4 + 6
Table: Hours of extra sleep on drugs 1 and 2, differences, signs and ranksof sleep study data
14 / 30
Non-parametric tests
One Sample Test: Wilcoxon Signed-Rank
One sample / paired test exampleI Approximate p-value calculation
9∑i=1
SRi = 39,
√√√√ 9∑i=1
SR2i = 16.86
Z = 2.31, and the two sided test yields a p-value equal to2*(1-.989556)= 0.0209
I Wilcoxon signed rank test statistical program output
Wilcoxon signed rank test
data: sleep.data
V = 42, p-value = 0.02077
alternative hypothesis: true location is not equal to 0
I Thus, we reject H0 and conclude Drug 2 increases sleep bymore hours than Drug 1 (p = 0.02)
15 / 30
Non-parametric tests
One Sample Test: Wilcoxon Signed-Rank
One sample / paired test example
I We could also perform sign test on sleep dataI If drugs are equally effective, we should have same number of
positives and negatives (e.g., Prob(D>0)=.5).I Analogous to coin flip example from last time.I In the observed data: 1 negative and 8 positives (we throw out
1 ‘no change’)I One sided p-value: probability of observing 0 or 1 negativesI Two sided p-value: probability of observing 0, 1, 8, or 9
negativesI p = 0.04, → reject H0 at α = 0.05
16 / 30
Non-parametric tests
One Sample Test: Wilcoxon Signed-Rank
Wilcoxon signed rank test
I Assumes the distribution of differences is symmetric
I When the distribution is symmetric, the signed rank test testswhether the median difference is zero
I In general it tests that, for two randomly chosen observationsi and j with values (differences) xi and xj , that the probabilitythat xi + xj > 0 is 1
2
I The estimator that corresponds exactly to the test in allsituations is the pseudomedian, the median of all possiblepairwise averages of xi and xj , so one could say that thesigned rank test tests H0: pseudomedian=0
17 / 30
Non-parametric tests
One Sample Test: Wilcoxon Signed-Rank
I To test H0 : η = η0, where η is the population median (not adifference) and η0 is some constant, we create the n valuesxi − η0 and feed those to the signed rank test, assuming thedistribution is symmetric
I When all nonzero values are of the same sign, the test reducesto the sign test and the 2-tailed P-value is ( 1
2 )n−1 where n isthe number of nonzero values
18 / 30
Non-parametric tests
Two Sample Test: Wilcoxon–Mann–Whitney
Two sample WMW test
I The Wilcoxon–Mann–Whitney (WMW) 2-sample rank sumtest is for testing for equality of central tendency of twodistributions (for unpaired data)
I Ranking is done by combining the two samples and ignoringwhich sample each observation came from
I Example:
Females 120 118 121 119Males 124 120 133
Ranks for Females 3.5 1 5 2Ranks for Males 6 3.5 7
19 / 30
Non-parametric tests
Two Sample Test: Wilcoxon–Mann–Whitney
Two sample WMW test
I Doing a 2-sample t-test using these ranks as if they were rawdata and computing the P-value against 4+3-2=5 d.f. willwork quite well
I Loosely speaking the WMW test tests whether the populationmedians of the two groups are the same
I More accurately and more generally, it tests whetherobservations in one population tend to be larger thanobservations in the other
I Letting x1 and x2 respectively be randomly chosenobservations from populations one and two, WMW testsH0 : C = 1
2 , where C =Prob[x1 > x2]
20 / 30
Non-parametric tests
Two Sample Test: Wilcoxon–Mann–Whitney
Two sample WMW test
I Wilcoxon rank sum test statistic
W = R − n1(n1 + 1)
2
where R is the sum of the ranks in group 1
I Under H0, µw = n1n22 and σw =
√n1n2(n1+n2+1)
12 , and
z =W − µwσw
follow a N(0,1) distribution.
21 / 30
Non-parametric tests
Two Sample Test: Wilcoxon–Mann–Whitney
Two sample WMW test
I The C index (concordance probability) may be estimated bycomputing
C =R̄ − n1+1
2
n2,
where R̄ is the mean of the ranks in group 1
I For the above data R̄ = 2.875 and C = 2.875−2.53 = 0.125
I We estimate: probability that a randomly chosen female has avalue greater than a randomly chosen male is 0.125.
22 / 30
Non-parametric tests
Two Sample Test: Wilcoxon–Mann–Whitney
Two sample WMW test: Example
I Fecal calprotectin being evaluated as a possible biomarker ofdisease severity
I Calprotectin measured in 26 subjects, 8 observed to haveno/mild activity by endoscopy
I Calprotectin has upper detection limit at 2500 unitsI A type of missing data, but need to keep in analysis
23 / 30
Non-parametric tests
Two Sample Test: Wilcoxon–Mann–Whitney
Two sample WMW test: Example
I Study question: Are calprotectin levels different in subjectswith no or mild activity compared to subjects with moderateor severe activity?
I Statement of the null hypothesisI H0 : Populations with no/mild activity have the same
distribution of calprotectin as populations withmoderate/severe activity
I H0 : C = 12
24 / 30
Non-parametric tests
Two Sample Test: Wilcoxon–Mann–Whitney
Feca
l Cal
prot
ectin
No or Mild Activity Moderate or Severe Activity
0
500
1000
1500
2000
2500 22.5
9
22.5
13
6
22.5
5
10
22.5
7
18
22.5
8
15
12
22.5
14
4
11
2
1617
22.522.5
31
Above Detection Limit
25 / 30
Non-parametric tests
Two Sample Test: Wilcoxon–Mann–Whitney
Two sample WMW test: Example
I Stat program output
Wilcoxon rank sum test
data: calpro by endo2
W = 23.5, p-value = 0.006257
alternative hypothesis: true location shift is not equal to 0
I W = 59.5− 8∗92 = 23.5
I A common (but loose) interpretation: People withmoderate/severe activity have higher median fecal calprotectinlevels than people with no/mild activity (p = 0.006).
26 / 30
Non-parametric tests
Confidence Intervals
Confidence Intervals for medians
I Confidence intervals for the (one sample) medianI Ranks of the observations are used to give approximate
confidence intervals for the median (See Altman book)I e.g., if n = 12, the 3rd and 10th largest values give a 96.1%
confidence intervalI For larger sample sizes, the lower ranked value (r) and upper
ranked value (s) to select for an approximate 95% confidenceinterval for the population median is
r =n
2− 1.96 ∗
√n
2and s = 1 +
n
2+ 1.96 ∗
√n
2
I e.g., if n = 100 then r = 40.2 and s = 60.8, so we would pickthe 40th and 61st largest values from the sample to specify a95% confidence interval for the population median
27 / 30
Non-parametric tests
Confidence Intervals
Confidence Intervals
I Confidence intervals for the difference in two medians (twosamples)
I Considers all possible differences between sample 1 and sample2
FemaleMale 120 118 121 119124 4 6 3 5120 0 2 -1 1133 13 15 12 14
I An estimate of the median difference (males - females) is themedian of these 12 differences, with the 3rd and 10th largestvalues giving an (approximate) 95% CI
I Median estimate = 4.5, 95% CI = [1, 13]
I Specific formulas found in Altman, pages 40-41
28 / 30
Non-parametric tests
Confidence Intervals
Confidence Intervals
I BootstrapI General method, not just for mediansI Non-parametric, does not assume symmetryI Iterative method that repeatedly samples from the original dataI Algorithm for creating a 95% CI for the difference in two
medians
1. Sample with replacement from sample 1 and sample 22. Calculate the difference in medians, save result3. Repeat Steps 1 and 2 1000 times
I A (naive) 95% CI is given by the 25th and 97.5th largest valuesof your 1000 median differences
I For the male/female data, median estimate = 4.5, 95% CI =[-0.5, 14.5], which agrees with the conclusion from a WMWrank sum test (p = 0.11).
29 / 30
Non-parametric tests
Summary
Summary: non-parametric tests
I Wilcoxon signed rank test: alternative to the one sample t-test
I Wilcoxon Mann Whitney or rank sum test: alternative to thetwo sample t-test
I Attractive when parametric assumptions are believed to beviolated
I Drawback: if based on ranks, tests do not provide insight intoeffect size
I Non-parametric tests are attractive if all we care about isgetting a P-value
30 / 30