nonparametric statistics

25
Nonparametric Statistics STAT E-150 Statistical Methods

Upload: paige

Post on 23-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

STAT E-150 Statistical Methods. Nonparametric Statistics. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Nonparametric Statistics

Nonparametric Statistics

STAT E-150Statistical Methods

Page 2: Nonparametric Statistics

2

The tests we have discussed generally require that the data meet particular conditions. Nonparametric tests make fewer assumptions about the data; they generally do not require that the data follow any particular distribution, although they often require that the population(s) have continuous distributions.

In addition, many nonparametric tests are not based on the actual values of the data. They may use counts of values, or the rank of each observation.

Page 3: Nonparametric Statistics

3

Here is an example: A neurologist may collect data to investigate the depressant effects of certain recreational drugs. She tested 20 clubbers; 10 were given an ecstasy tablet to take on a Saturday night and 10 were allowed to drink only alcohol. Levels of depression were measured using the Beck Depression Inventory (BDI) the day after and at midweek.

Page 4: Nonparametric Statistics

4

Here is the data:

Page 5: Nonparametric Statistics

5

The Wilcoxon Rank Sum Test Suppose there is no difference in the depression levels between ecstasy and alcohol users. Rank the data without regard to the group the subject belonged to, giving the lowest value a rank of 1, the next lowest the rank of 2, etc.

If there is no difference between the groups we should find similar number of low and high ranks in each group. If we added up the ranks, the sums for each group should be about the same.

Page 6: Nonparametric Statistics

6

What if there is a difference? Suppose the ecstasy group is more depressed than the alcohol group. Then there would be higher ranks in the ecstasy group than in the alcohol group, and the sum of the ranks for the ecstasy group would be higher than the sum for the alcohol group. When the groups are not the same size, the test statistic for the Wilcoxon Rank Sum Test, W, is the sum of the ranks for the smaller group. If the groups are the same size, the test statistic W is the value of the smaller summed rank.

Page 7: Nonparametric Statistics

7

Here are the steps: 1. Draw a simple random sample of size n1 from one population and

draw an independent SRS of size n2 from a second population.  2. Rank all N observations.

The sum W of the ranks for the first sample is the Wilcoxon rank sum statistic.

Page 8: Nonparametric Statistics

8

If the two populations have the same continuous distribution,  

then and   The Wilcoxon Rank Sum Test rejects the hypothesis that the two populations have identical distributions when the rank sum W is far from its mean.  That is, we can use the test statistic

1 2 1 2W

n n (n +n +1)SE =12

1 1 2n (n +n +1)W =2

zW

W - W=SE

Page 9: Nonparametric Statistics

9

How to rank the data? For the Wednesday data, arrange the values in ascending order, noting the group the subject belonged to. Then start at the lowest score, assigning a rank of 1, and continue ranking all values. When a value occurs more than once, average the ranks.

Scores Potential Rank

Actual Rank Group

3 1 1 A5 2 2 A6 3    6 4      5      6      7      8      9      10      11      12      13      14      15      16      17      18      19      20    

Page 10: Nonparametric Statistics

10

Sum of ranks for alcohol =  Sum of ranks for ecstasy =

W =Wednesday

WSE =Wednesday

Page 11: Nonparametric Statistics

11

The value of the test statistic for Wednesday is

 

If this value is large (>1.96), then the test is significant at α = .05. What can you conclude?

Wednesday

Wednesday Wednesdayz W

W - W=

SE

Page 12: Nonparametric Statistics

12

Here are the results for the Sunday data: 

Sum of ranks for alcohol =     

Sum of ranks for ecstasy =

Scores Potential Rank

Actual Rank Group

13 1 1 A15 2 2 E14 3 3 A15 4 5 A15 5 5 A15 6 5 E16 7 8.5 A16 8 8.5 A16 9 8.5 E16 10 8.5 E17 11 11 E18 12 13 E18 13 13 A18 14 13 A19 15 15.5 E19 16 15.5 A20 17 17.5 E20 18 17.5 A27 19 19 E35 20 20 E

Page 13: Nonparametric Statistics

13

SundayW =

SundayWSE =

Sunday

Sunday SundaySundayz

W

W - WX - X= =s SE

 What can you conclude?

Page 14: Nonparametric Statistics

14

Here are the SPSS results:

The results for Sunday do not show a significant difference between the two groups (p = .28), but the results for Wednesday indicate that there is a difference in the depression scores for the two groups (p = 0+),

Page 15: Nonparametric Statistics

15

The results for Sunday do not show a significant difference between the two groups (p = .28), but the results for Wednesday indicate that there is a difference in the depression scores for the two groups (p = 0+), That is, this data indicates that ecstasy is no more of a depressant than alcohol one day after taking it, than is alcohol. But for the midweek measures, the difference is significant (p is close to 0). This indicates that the ecstasy group had significantly higher levels of depression midweek than did the alcohol group. Note also that the mean rank for Wednesday scores is higher for the ecstasy users (15.10) than for the alcohol users (5.90).

Page 16: Nonparametric Statistics

16

Using SPSS First create a new coding variable for the nominal data: > Transform > Recode into different variables  The input variable is Drug; create a new variable, DrugCode  

Page 17: Nonparametric Statistics

17

Click on Old and New Values  

Page 18: Nonparametric Statistics

18

Code Ecstasy as "1" and click on Add

Code Alcohol as "2" and click on Add  

Page 19: Nonparametric Statistics

19

Then click on Continue and then click on Change and OK

You should see the new column in Data View. 

Page 20: Nonparametric Statistics

20

Click on Analyze > Nonparametric Tests > Legacy Dialogs > 2 Independent Samples

Choose BDISunday and BDIWednesday as the Test VariablesChoose DrugCode as the Grouping variableSelect Mann-Whitney U as the Test Type and click on OK.

Page 21: Nonparametric Statistics

21

Here are the results:

Page 22: Nonparametric Statistics

22

Outliers and Influential Points In linear regression, an outlier is an observation that lies outside the overall pattern for the data. Points that are outliers in the y-direction have large residuals, but other outliers may not.  An observation is influential if removing it would remarkably change the overall pattern. Points that are outliers in the x-direction are often influential. Influential points draw the regression line toward themselves, and so they cannot be identified by looking for large residuals. It should be noted that not all outliers are influential.

Page 23: Nonparametric Statistics

23

Does the age at which a child begins to talk predict a later score on a test of mental ability? This data shows the age in months at which each child spoke his/her first word, and each child’s Gesell Adaptive Score, the result of an ability test taken much later.

Child Age Child 19Score

1 15 952 26 713 10 834 9 915 15 1026 20 877 18 938 11 100 9 8 10410 20 9411 7 11312 9 9613 10 8314 11 8415 11 10216 10 100 17 12 10518 42 5719 17 12120 11 8621 10 100

Page 24: Nonparametric Statistics

24

The graph of the data shows a negative linear relationship. Child 18 is close to the line but is an outlier in the x-direction.

Because of its extreme position on the x-scale, this point has a strong influence on the regression line. It is an influential point.

Child 19 is an outlier in the y-direction; the point lies far from the regression line and has a large residual.  

Page 25: Nonparametric Statistics

25

Here are the results

with all points:

without Child 18:

without Child 19:

What differences do you see when Child 18 is removed?What differences do you see when Child 19 is removed?Which point is influential?