sociology 5811: lecture 3: measures of central tendency and dispersion copyright © 2005 by evan...

39
Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Upload: charleen-joseph

Post on 17-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Sociology 5811:Lecture 3: Measures of Central

Tendency and Dispersion

Copyright © 2005 by Evan Schofer

Do not copy or distribute without permission

Page 2: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Announcements

• First math problem set will be handed out in Lab on Monday…

• Due September 20

Today’s Class: • The Mean (and relevant mathematical notation)

• Measures of Dispersion

Page 3: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Review: Variables / Notation

• Each column of a dataset is considered a variable

• We’ll refer to a column generically as “Y”

Person # Guns owned

1 0

2 3

3 0

4 1

5 1

The variable “Y”

Note: The total number of cases in

the dataset is referred to as “N”.

Here, N=5.

Page 4: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Equation of Mean: Notation• Each case can be

identified a subscript• Yi represents “ith” case of

variable Y• i goes from 1 to N• Y1 = value of Y for first

case in spreadsheet• Y2 = value for second

case, etc.• YN = value for last case

Person # Guns owned (Y)

1 Y1 = 0

2 Y2 = 3

3 Y3 = 0

4 Y4 = 1

5 Y5 = 1

Page 5: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Calculating the Mean

• Equation:

• 1. Mean of variable Y represented by Y with a line on top – called “Y-bar”

• 2. Equals sign means equals: “is calculated by the following…”

• 3. N refers to the total number of cases for which there is data

• Summation () – will be explained next…

N

i

iYN

Y1

1

Page 6: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Equation of Mean: Summation

• Sigma (Σ): Summation– Indicates that you should add up a series of numbers

The thing on the right is the

item to be added

repeatedly

N

i

iY1

The things on top and bottom tell you how many times to add up Y-sub-i…

AND what numbers to

substitute for i.

Page 7: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Equation of Mean: Summation

• 1. Start with bottom: i = 1.– The first number to add is Y-sub-1

N

i

iY1

1Y 2Y 5Y3Y 4Y

• 2. Then, allow i to increase by 1 – The second number to add is i = 2, then i = 3

• 3. Keep adding numbers until i = N– In this case N=5, so stop at 5

Page 8: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Equation of the Mean: Example 2

• Can you calculate mean for gun ownership?

Person # Guns owned (Y)

1 Y1 = 0

2 Y2 = 3

3 Y3 = 0

4 Y4 = 1

5 Y5 = 1

N

i

iYN

Y1

1

• Answer:

155

1Y

Page 9: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Properties of the Mean• The mean takes into account the value of every

case to determine what is “typical”– In contrast to the the mode & median– Probably the most commonly used measure of

“central tendency”• But, it is often good to look at median & mode also!

• Disadvantages– Every case influences outcome… even unusual ones– Extreme cases affect results a lot– The mean doesn’t give you any information on the

shape of the distribution• Cases could be very spread out, or very tightly clustered

Page 10: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

The Mean and Extreme Values

Case Num CD’s Num CD’s2

1 20 20

2 40 40

3 0 0

4 70 1000

Mean 32.5 265

• Extreme values affect the mean a lot:

Changing this one case really

affects the mean a lot

Page 11: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Example 1

Number of CDs (Group 1)

200

175

150

125

100

75

50

25

0

16

14

12

10

8

6

4

20

Std. Dev = 21.72

Mean = 101

N = 23.00

• And, very different groups can have the same mean:

Page 12: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Example 2

Number of CDs (Group 2)

200.0

175.0

150.0

125.0

100.0

75.0

50.0

25.0

0.0

6

5

4

3

2

1

0

Std. Dev = 67.62

Mean = 100.0

N = 23.00

Page 13: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Example 3

Number of CDs (Group 3)

200

175

150

125

100

75

50

25

0

14

12

10

8

6

4

2

0

Std. Dev = 102.15

Mean = 104

N = 23.00

Page 14: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Interpreting Dispersion

• Question: What are possible social interpretations of the different distributions (all with the same mean)?

• Example 1: Individuals cluster around 100

• Example 2: Individuals distributed sporadically over range 0-200

• Example 3: Individuals in two groups – near zero and near 200

Page 15: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Measures of Dispersion

• Remember: Goal is to understand your variable…

• Center of the distribution is only part of the story

• Important issue:

• How “spread out” are the cases around the mean?– How “dispersed”, “varied” are your cases?– Are most cases like the “typical” case? Or not?

Page 16: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Measures of Dispersion

• Some measures of dispersion:

• 1. Range– Also related: Minimum and Maximum

• 2. Average Absolute deviation

• 3. Variance

• 4. Standard deviation

Page 17: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Minimum and Maximum

• Minimum: the lowest value of a variable represented in your data

• Maximum: the highest value of a variable represented in your data

• Example: In previous histograms about number of CDs owned, the minimum was 0, the maximum was 200.

Page 18: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

The Range

• The Range is calculated as the maximum minus the minimum– In case of CD ownership, 200 - 0 = 200

• Advantage:– Easy

• Disadvantage:– 1. Easily influenced by extreme values… may not

be representative – 2. Doesn’t tell you anything about the middle cases

Page 19: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

The Idea of Deviation

• Deviation: How much a particular case differs from the mean of all cases

• Deviation of zero indicates the case has the same value as the mean of all cases– Positive deviation: case has higher value than mean– Negative deviation: case has lower value than mean

• Extreme positive/negative indicates cases further from mean.

Page 20: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Deviation of a Case

YYd ii • Formula:

• Literally, it is the distance from the mean (Y-bar)

Page 21: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Deviation Example

Case Num CD’s Deviation from mean (32.5)

1 20 -12.5

2 40 7.5

3 0 -32.5

4 70 37.5

Page 22: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Turning the Deviation into a Useful Measure of Dispersion

• Idea #1: Add it all up– The sum of deviation for all cases:

• What is sum of the following?-12.5, 7.5, -32.5, 37.5

• Problem: Sum of deviation is always zero– Because mean is the exact center of all cases– Cases equally deviate positively and negatively– Conclusion: You can’t measure dispersion this way

N

iid

1

Page 23: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Turning the Deviation into a Useful Measure of Dispersion

• Idea #2: Sum up “absolute value” of deviation– Absolute value makes negative values positive– Designated by vertical bars:

N

iid

1

• What is sum?-12.5, 7.5, -32.5, 37.5

• Answer: 90– These 4 cases deviate by 90 cds from the mean

• Problem: Sum of Absolute Deviation grows larger if you have more cases…– Doesn’t allow comparison across samples

Page 24: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Turning the Deviation into a Useful Measure of Dispersion

• Idea #3: The Average Absolute Deviation– Calculate the sum, divide by total N of cases– Gives the deviation of the average case

• Formula:

N

YY

N

dAAD

N

i

i

N

i

i

11

Page 25: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Turning the Deviation into a Useful Measure of Dispersion

• Digression: Here we have used the mean to determine “typical” size of case deviations– Originally, I introduce the mean as a way to analyze

actual case values (e.g. # of CDs owned)– Now: Instead of looking at typical case values, we

want to know what sort of deviation is typical• In other words a statistic, the mean, is being used to analyze

another statistic – a deviation

– This is a general principle that we will use often: statistics can help us understand our raw data and also further summarize our statistical calculations!

Page 26: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Average Absolute Deviation

• Example: Total Deviation = 90, N=4– What is Average absolute deviation?– Answer: 22.5

• Advantages– Very intuitive interpretation:

• Tells you how much cases differ from the mean, on average

• Disadvantages– Has non-ideal properties, according to statisticians

Page 27: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Turning the Deviation into a Useful Measure of Dispersion

• Idea #4: Square the deviation to avoid problem of negative values– Sum of “squared” deviation– Divide by “N-1” (instead of N) to get the average

• Result: The “variance”:

1

)(

1

2

11

2

2

N

YY

N

ds

N

ii

N

ii

Y

Page 28: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Calculating the Variance 1

Case Num CD’s (Y)

1 20

2 40

3 0

4 70

Page 29: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Calculating the Variance 2

Case Num CD’s (Y)

Mean(Y bar)

1 20 32.5

2 40 32.5

3 0 32.5

4 70 32.5

Page 30: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Calculating the Variance 3

Case Num CD’s (Y)

Mean(Y bar)

Deviation (d)

1 20 32.5 -12.5

2 40 32.5 7.5

3 0 32.5 -32.5

4 70 32.5 37.5

Page 31: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Calculating the Variance 4

Case Num CD’s (Y)

Mean(Y bar)

Deviation (d)

Squared Deviation (d2)

1 20 32.5 -12.5 150

2 40 32.5 7.5 56.25

3 0 32.5 -32.5 1056.25

4 70 32.5 37.5 1406.25

Page 32: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Calculating the Variance 5

• Variance = Average of “squared deviation”– Average = mean = sum up, divide by N– In this case, use N-1

• Sum of 150 + 56.25 + 1056.26 + 1406.25 = 2668.75

• Divide by N-1– N-1 = 4-1 = 3

• Compute variance:

• 2668.75 / 3 = 889.6 = variance = s2

Page 33: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

The Variance

• Properties of the variance– Zero if all points cluster exactly on the mean– Increases the further points lie from the mean– Comparable across samples of different size

• Advantages– 1. Provides a good measure of dispersion– 2. Better mathematical characteristics than the AAD

• Disadvantages:– 1. Not as easy to interpret as AAD– 2. Values get large, due to “squaring”

Page 34: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Turning the Deviation into a Useful Measure of Dispersion

• Idea #5: Take square root of Variance to shrink it back down

• Result: Standard Deviation– Denoted by lower-case s– Most commonly used measure of dispersion

• Formula:

1

)( 2

12

N

YYss

N

ii

YY

Page 35: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Calculating the Standard Deviation

• Simply take the square root of the variance

• Example:– Variance = 889.6– Square root of 889.6 = 29.8

• Properties:– Similar to Variance– Zero for perfectly concentrated distribution– Grows larger if cases are spread further from the mean– Comparable across different sample sizes

Page 36: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Example 1: s = 21.72

Number of CDs (Group 1)

200

175

150

125

100

75

50

25

0

16

14

12

10

8

6

4

20

Std. Dev = 21.72

Mean = 101

N = 23.00

Page 37: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Example 2: s = 67.62

Number of CDs (Group 2)

200.0

175.0

150.0

125.0

100.0

75.0

50.0

25.0

0.0

6

5

4

3

2

1

0

Std. Dev = 67.62

Mean = 100.0

N = 23.00

Page 38: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Example 3: s = 102.15

Number of CDs (Group 3)

200

175

150

125

100

75

50

25

0

14

12

10

8

6

4

2

0

Std. Dev = 102.15

Mean = 104

N = 23.00

Page 39: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Thinking About Dispersion• Suppose we observe that the standard deviation of

wealth is greater in the U.S. than in Sweden…– What can we conclude about the two countries?

• Guess which group has a higher standard deviation for income: Men or Women? Why?

• The standard deviation of a stock’s price is sometimes considered a measure of “risk”. Why?

• Suppose we polled people on two political issues and the S.D. was much higher for one

• What are some possible interpretations?

• What are some other examples where the deviation would provide useful information?