is the distribution normal or skewed?

Post on 10-Jul-2015

122 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Difference between Normal and Skewed Distributions

This presentation will help you determine if the data set from the problem you are asked to solve has a normal or skewed distribution

This presentation will help you determine if the data set from the problem you are asked to solve has a normal or skewed distribution

Normal

Skewed

Knowing if your data’s distribution is skewed or normal is the second way of knowing if you will use what is called a parametric or a nonparametric test

The first way (as you may recall from the last decision point) is to determine if the data is scaled, ordinal, or nominal

But first,

What is a distribution?

We will illustrate what a distribution is with a data set that describes the hours students’ study

Here is the data set:

Student Hours of Study

Student Hours of Study

Bart 1

Student Hours of Study

Bart 1

Basheba 2

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Data

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Data Set

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

From this data set we will create a distribution:

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

The X Axis, will be the number of hours of

study

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

The Y Axis, indicates the number of times

the same number occurs

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

The Y Axis, indicates the number of times

the same number occurs

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

The Y Axis, indicates the number of times

the same number occurs

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

The Y Axis, indicates the number of times

the same number occurs

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

The Y Axis, indicates the number of times

the same number occurs

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

The Y Axis, indicates the number of times

the same number occurs

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

Number of Occurrences

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

1

2

3

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

1

2

3

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

1

2

3

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

1

2

3

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

1

2

3

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

1

2

3

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

1

2

3

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

1

2

3

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

1

2

3

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

1

2

3

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

1

2

3

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

1

2

3

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

1

2

3

Student Hours of Study

Bart 1

Basheba 2

Bella 2

Bob 3

Boston 3

Bunter 3

Buxby 4

Bybee 4

Bwinda 5

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

1

2

3

This is a distribution

One way to represent a distribution like this:

One way to represent a distribution like this:

One way to represent a distribution like this:

Is like this:

One way to represent a distribution like this:

Is like this:

One way to represent a distribution like this:

Is like this:Normal distributions have the majority of the data in

the middle

One way to represent a distribution like this:

Is like this:Normal distributions have the majority of the data in

the middle

One way to represent a distribution like this:

Is like this:

With decreasing but equal amounts

toward the tails

One way to represent a distribution like this:

Is like this:

With decreasing but equal amounts

toward the tails

With decreasing but equal amounts

toward the tails

The mean or average works really well with normal distributions

Another way to say it, is that the mean describes well the center point of a normal distribution

A Normal Distribution

The Mean

Here is how you calculate the mean:

Let’s put the data into the distribution

21

2

3

3

3

4

4

5

21

2

3

3

3

4

4

5

Mean =

21

2

3

3

3

4

4

5

Mean =

21

2

3

3

3

4

4

5

Mean =

21

2

3

3

3

4

4

5

Mean =𝟏

21

2

3

3

3

4

4

5

Mean =1+𝟐

21

2

3

3

3

4

4

5

Mean =1+2+𝟐

21

2

3

3

3

4

4

5

Mean =1+2+2+𝟑

21

2

3

3

3

4

4

5

Mean =1+2+2+3+𝟑

21

2

3

3

3

4

4

5

Mean =1+2+2+3+3+𝟑

21

2

3

3

3

4

4

5

Mean =1+2+2+3+3+3+𝟒

21

2

3

3

3

4

4

5

Mean =1+2+2+3+3+3+4+𝟒

21

2

3

3

3

4

4

5

Mean =1+2+2+3+3+3+4+4+𝟓

21

2

3

3

3

4

4

5

Mean =1+2+2+3+3+3+4+4+5

Divided by the number of total values

21

2

3

3

3

4

4

5

Mean =1+2+2+3+3+3+4+4+5

Divided by the number of total values

Mean =1+2+2+3+3+3+4+4+5

𝟗

21

2

3

3

3

4

4

5

21

2

3

3

3

4

4

5

Mean =1+2+2+3+3+3+4+4+5

9= 27

9

21

2

3

3

3

4

4

5

Mean =1+2+2+3+3+3+4+4+5

9= 27

9= 3

21

2

3

3

3

4

4

5

Mean = 3

21

2

3

3

3

4

4

5

Mean = 3

The mean is a good estimate of the center of a distribution when the distribution is normal

But, the mean is not a good estimate of the center when the distribution is not normal

This is because of what we call OUTLIERS

What is an outlier?

An outlier is a data point that falls outside the overall pattern of the distribution

As an example, here is the overall pattern

As an example, here is the overall pattern

21

2

3

3

3

4

4

5

But what if we changed the 5

But what if we changed the 5

21

2

3

3

3

4

4

5

to a 50

to a 50

21

2

3

3

3

4

4

50

to a 50

21

2

3

3

3

4

4

50

To illustrate what happens to the mean when an outlier is present, let’s go back to this distribution:

To illustrate what happens to the mean when an outlier is present, let’s go back to this distribution:

21

2

3

3

3

4

4

5

Let’s say one student, instead of studying five hours studies 23 hours a day!!!!!

Watch what happens to the mean:

Before

Mean =1+2+2+3+3+3+4+4+5

9= 27

9= 3

21

2

3

3

3

4

4

5

After

21

2

3

3

3

4

4

5

21

2

3

3

3

4

4

23

21

2

3

3

3

4

4

23

21

2

3

3

3

4

4

23

Mean =1+2+2+3+3+3+4+4+𝟐𝟑

9=

21

2

3

3

3

4

4

23

Mean =1+2+2+3+3+3+4+4+23

9= 𝟒𝟓

𝟗

21

2

3

3

3

4

4

23

Mean =1+2+2+3+3+3+4+4+23

9= 45

9= 𝟓

Once again, BEFORE

Once again, BEFORE

21

2

3

3

3

4

4

5

Mean =1+2+2+3+3+3+4+4+5

9= 27

9= 𝟑

AFTER

21

2

3

3

3

4

4

23

Mean =1+2+2+3+3+3+4+4+23

9= 45

9= 𝟓

Just by changing one value from “5” to “23” the mean changed by two values (from “3” to “5”)

Thus, the mean is very sensitive to outliers

Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL

Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL

Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL

Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL

Here is a guiding principle

1 If your data set is normally distributed like this:

1 If your data set is normally distributed like this:

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

1

2

3

1 If your data set is normally distributed like this, then you will use a parametric test

2

2 If your data set is skewed either to the right

2 If your data set is skewed either to the right

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

1

2

3

2 If your data set is skewed either to the right

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

1

2

3

2 If your data set is skewed either to the right

or to the left

2 If your data set is skewed either to the right

or to the left

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

1

2

3

2 If your data set is skewed either to the right

or to the left

Hours of Study1 2 3 4 5

Nu

mb

er

of

Occ

urr

en

ces

1

2

3

2 If your data set is skewed either to the right

or to the left, then you will use a nonparametrictest

In summary,

In summary,

A parametric test is used when the problem’s data set is normally distributed

In summary,

A parametric test is used when the problem’s data set is normally distributed

In summary,

A parametric test is used when the problem’s data set is normally distributed

A non-parametric test is used when the problem’s data set is very skewed to the right or the left:

In summary,

A parametric test is used when the problem’s data set is normally distributed

A non-parametric test is used when the problem’s data set is very skewed to the right or the left:

In summary,

A parametric test is used when the problem’s data set is normally distributed

A non-parametric test is used when the problem’s data set is very skewed to the right or the left:

In summary,

A parametric test is used when the problem’s data set is normally distributed:

A non-parametric test is used when the problem’s data set is very skewed to the right or the left:

Or very non-normal:

A parametric test is used when the problem’s data set is normally distributed:

A non-parametric test is used when the problem’s data set is very skewed to the right or the left:

Or very non-normal:

In summary,

So, how do you know if your data is normally distributed?

So, how do you know if your data is normally distributed?

Go to the Learning Module entitled: Assessing Skew. You will find it next to the link for this presentation.

So, how do you know if your data is normally distributed?

Go to the Learning Module entitled: Assessing Skew. You will find it next to the link for this presentation.

After you have viewed that learning module use SPSS to assess the skew of your data.

Is your data normally distributed or skewed?

If your data was skewed with a critical ratio greater than 2.0 or less than -2.0 then select

Skewed

Otherwise select

Normal

It is important to note that if you choose Skewed, your data will be analyzed using what are called non-parametric tests

Skewed

Non-parametric tests differ from parametric tests in one simple way:

Parametric tests use the mean in their calculations

Parametric tests use the mean in their calculations

Non-parametric tests use the median

What is the median?

The median is simply the middle score of a data set where

The median is simply the middle score of a data set where

• 50% of the scores fall below it and

The median is simply the middle score of a data set where

• 50% of the scores fall below it and

• 50% of the scores are above it

To illustrate let’s go back to this distribution:

To illustrate let’s go back to this distribution:

21

2

3

3

3

4

4

5

With the Median we simply determine the mid point:

21

2

3

3

3

4

4

5

Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5

21

2

3

3

3

4

4

5

Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5

21

2

3

3

3

4

4

5

4 units

Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5

21

2

3

3

3

4

4

5

4 units 4 units

Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5

21

2

3

3

3

4

4

5

4 units 4 units

Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5

21

2

3

3

3

4

4

5

4 units 4 units

Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5

21

2

3

3

3

4

4

5

4 units 4 units

Notice that the Median is unaffected by outliers

To illustrate this, we’ll change the value “5” to a “10”:

21

2

3

3

3

4

4

5

21

2

3

3

3

4

4

10

Watch what happens to the median:

21

2

3

3

3

4

4

10

21

2

3

3

3

4

4

10

Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +10

21

2

3

3

3

4

4

10

Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +104 units

10

21

2

3

3

3

4

4

10

Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +104 units

10

4 units

21

2

3

3

3

4

4

10

Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +104 units

10

4 units

Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5

21

2

3

3

3

4

4

10

4 units 4 units

Hmm, it’s still 3

But, what if we change the value 10 to 1,000!!!

Watch again what happens to the median:

21

2

3

3

3

4

4

1,000

21

2

3

3

3

4

4

Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +1000

1,000

21

2

3

3

3

4

4

Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +10004 units

1,000

21

2

3

3

3

4

4

Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +10004 units 4 units

1,000

21

2

3

3

3

4

4

Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +10004 units 4 units

1,000

Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 1000

21

2

3

3

3

4

4

4 units 4 units

1,000

What do you know –

It’s still 3

Here is the key take away:

The mean is affected by outliers

The mean is affected by outliers

The median is not affected by outliers

Therefore the mean is used with more or less NORMAL DISTRIBUTIONS

Therefore the mean is used with more or less NORMAL DISTRIBUTIONS

And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS

And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS

And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS

And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS

So, why doesn’t everyone use non-parametric methods since they are unaffected by outliers?

Because parametric methods provide more meaningful information about the population than do non-parametric methods

So, if your data is skewed it’s better to get what information you can from a non-parametric test,

So, if your data is skewed it’s better to get what information you can from a non-parametric test, even though a parametric test would have provided more information (if your data had been normally distributed)

So, based on your analysis, which distribution best reflect your data set:

So, based on your analysis, which distribution best reflect your data set:

Normal

Skewed

top related