nonparametric hypothesis testing methods

Nonparametric Hypothesis Testing

Guy Lion

December 2005

Presentation Date Consultant's Name 2

Nonparametric tests handle variables that are not normally distributed.

Dependent - Paired Independent - UnpairedParametric Paired t Test Unpaired t Test

Nonparametric Sign Test Mann-Whitney U Test

Sample


When to use these methods

a) With large samples (> 100), even if the variable is not normally distributed, the samples Mean is [Central Limit Theorem]. Use Parametric test.

b) Nonparametric tests can be superior with samples with less than 100 observations.

c) Before proceeding with a nonparametric test confirm that the variable does not have a normal distribution (Kurtosis and Skewness close to Zero using Excel).

< 100 > 100

Normal Para Para

Not Normal Nonpara Para

Sample size

Dis

trib

uti

on


Sign Test Testing for Differences in Paired data.

• Count number of paired data values that are different. This is the Modified Sample Size (MSS).

• Count how many outcome values have increased (or decreased).

• Use a binomial distribution algorithm to figure out what is the probability that the two samples come from populations with identical distribution.


Sign Test Binomial Distribution algorithm

Probability Formula

Cumulative function Cumulative function

Probability

Cumulative function 1 - (Cumulative function - Mass function)

< = 0.5

> 0.5

Condition


Sign Test testing the creativity of an Ad campaign

Ad 1 Ad 2Score Score Change

4 2 -22 4 24 5 14 4 4 4 2 5 33 3 4 5 13 5 25 4 -13 4 13 5 24 5 15 5 4 5 15 4 -12 5 3

Count 17 17Average 3.59 4.35Median 4.00 5.00Skew -0.27 -1.48Kurt -0.81 2.23

Mod. sample size 13

Classic situation where we need to use a nonparametric test. This is because the samples are small (17 observations), and the variables are not normally distributed (check Skewness and Kurtosis).


Sign Test testing the creativity of an Ad campaign (continued)

Binomial distributionp Mean prop. 0.5n Trials 13

Cum. Mass 1 - prob. function (Cum. - Mass)

0 0.0% 0.0% 100.0%1 0.2% 0.2% 100.0%

# of 2 1.1% 1.0% 99.8%values 3 4.6% 3.5% 98.9%

decreased 4 13.3% 8.7% 95.4%5 29.1% 15.7% 86.7%6 50.0% 20.9% 70.9%7 70.9% 20.9% 50.0%8 86.7% 15.7% 29.1%

# of 9 95.4% 8.7% 13.3%values 10 98.9% 3.5% 4.6%

increased 11 99.8% 1.0% 1.1%12 100.0% 0.2% 0.2%13 100.0% 0.0% 0.0%


Different view of same data# of # of

values valuesdecreased increased Prob.

0 13 0.0%1 12 0.2%2 11 1.1%3 10 4.6%4 9 13.3%5 8 29.1%6 7 50.0%


Mann-Whitney test for Unpaired testing Steps.

• Put both samples together. Rank values in ascending order. For repeated numbers (ties) across samples, use the average of their ranks so that identical numbers get identical ranks.

• Find the average rank for each sample.

• Calculate Difference in average rank.

• Find the Standard Error for the average difference in the ranks: (n1 + n2)[SQRT(n1 + n2 + 1)/(12n1n2)].

• Divide the Difference in avg. rank (step 3) by the Standard Error (step 4) to find the test statistic (a Z value).

• Calculate P Value using NORMSDIST.

Note Steps 2 through 6 are similar to the unpaired t Test except it uses Ranks instead of Values.


Example: Testing income of mortg. applicants

Fixed-Rate Variable-Rate34,000$ 37,500$ 25,000$ 86,500$ 41,000$ 36,500$ 57,000$ 65,500$ 79,000$ 21,500$ 22,500$ 36,500$ 30,000$ 99,500$ 17,000$ 36,000$ 36,500$ 91,000$ 28,000$ 59,500$

240,000$ 31,000$ 22,000$ 88,000$ 57,000$ 35,500$ 68,000$ 72,000$ 58,000$ 49,500$

Testing if applicants for fixed-rate mortgages have higher income than applicants for variable-rate mortgages.

The fixed-rate applicants have one high income value ($240,000). Kurtosis and Skewness of both samples confirm they are not normally distributed. The unpaired t test would not work well.


Sorting and Ranking

Ranked in ascending order. The figures in yellow are identical ($36,500). They originally ranked 12th, 13th, and 14th. So, they all received the tied ranking of 13th.

Fixed Rate Variable Rate17,000$ Fixed 1 21,500$ Variable 222,000$ Fixed 3 31,000$ Variable 822,500$ Fixed 4 35,500$ Variable 1025,000$ Fixed 5 36,000$ Variable 1128,000$ Fixed 6 36,500$ Variable 1330,000$ Fixed 7 36,500$ Variable 1334,000$ Fixed 9 37,500$ Variable 1536,500$ Fixed 13 59,500$ Variable 2141,000$ Fixed 16 65,500$ Variable 2249,500$ Fixed 17 72,000$ Variable 2457,000$ Fixed 18 86,500$ Variable 2657,000$ Fixed 19 88,000$ Variable 2758,000$ Fixed 20 91,000$ Variable 2868,000$ Fixed 23 99,500$ Variable 2979,000$ Fixed 25

240,000$ Fixed 30


P Value (probability difference is due to chance)

Fixed Rate Variable RateAverage rank 13.50 17.79Difference in avg. rank 4.29Sample size 16 14

Standard Error: (n1 + n2)(SQRT(n1+n2+1)/(12n1n2)n1 + n2 30n1 + n2 + 1 3112n1n2 2688Standard Error 3.22

Test statistics: Difference in avg. rank/Standard ErrorDiff. In avg. rank 4.29Standard Error 3.22Test statistic - Z value 1.33 nbr. of Standard ErrorP Value 18.3% Using NORMSDIST

Based on ranks (not values), there is an 18.3% probability the two samples come from same population. There is a 81.7% probability that the Variable Rate mortgage applicants have a higher income because they have a higher average rank (17.79 vs 13.50).


Mann-Whitney U. Things to watch for• Breaking off the ties may not have much impact. Having

redone the last example without breaking the ties, depending on how the yellow figures got ranked you get P values of 17.0% or 19.8% not much different than the 18.3%.

• Important caveat. You need at least 10 observations for each of the two unpaired samples you test for to obtain a valid Z variable to calculate a P value.

• Mann-Whitney U is calculated differently than as shown that reflects calculations by Andrew Siegel that gets the same result faster. See Appendix on next slide.


Appendix: The actual Mann-Whitney U

Calculation Input OutputSample size n1 14 Value of U 80Sample size n2 16 Mean of U 112.0R1 (sum of rank of n1) 249 of U 24.1

Z value 1.33Formula P Value 18.3%Calculating value of U: Using NORMSDISTU = n1n2 + (n1(n1+1)/2 - R1n1n2 224+ (n1(n1+1)/2 105- R1 -249U 80

Mean of U = n1n2/2 of U = SQRT(n1n2(n1 + n2 +1)/12) Z value = (U - Mean of U)/ of U

nonparametric hypothesis testing methods

Education

unpaired samples

test statistic

samples mean

large samples

unpaired t test

rank values

average difference

p values