describing data: 2. numerical summaries of data using measures of central tendency and dispersion

Post on 20-Dec-2015

230 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DESCRIBING DATA: 2

Numerical summaries of data

using measures of central tendency

and dispersion

Central tendency--Mode

Major FAnthropology 97Economics 104Geography 57Political Science 110Sociology 82

Table 1. Undergraduate Majors

Bimodal Distributions

Major FAnthropology 97Economics 110Geography 57Political Science 110Sociology 82

Table 1. Undergraduate Majors

Mode for Grouped Frequency Distributions based on Interval DataMean dailytemp.

Place A(f)

Place B(f)

10-19.9 degrees 5 020-29.9 5 530-39.9 20 1040-49.9 30 1550-59.9 20 3060-69.9 20 40

Midpoint of the modal class interval

Median

• The point in the distribution above which and below which exactly half the observations lie (50th percentile)

• Calculation depends on whether the no. of observations is odd or even.

Distribution 1(n=5)

Distribution 2(n=6)

198 197179 193172 189167 187154 183

179

Median=

188

MEDIAN for grouped frequency distributions based on interval data

Mean dailytemp. (f)

Cumulative(f)

10-19.9 degrees 5 520-29.9 5 1030-39.9 20 3040-49.9 30 6050-59.9 20 8060-69.9 20 100

Median = 40 + ((20/30) * 10) = 40 + 6.67 = 46.67

ARITHMETIC MEAN

nyY i /)(

47/28

7/)7763311(

y

Mean for Grouped Data

Mean dailytemp. (f)

Midpoint ofinterval

F timesmidpoint

10-19.9degrees

5 15 75

20-29.9 5 25 12530-39.9 20 35 70040-49.9 30 45 135050-59.9 20 55 110060-69.9 20 65 1300Totals 100 4650

Mean = sum of weighted midpoints / n = 4650/100=46.5

Mean is the balancing point of the distribution

0 1 2 3 4 5 6 7 8 9

XX

X

X X

X

X

MEAN

Key Properties of the Mean

• Sum of the differences between the individual scores and the mean equals 0

• sum of the squared differences between the individual scores and the mean equals a minimum value.

0)( YY

2)( YY The minimum value

Weaknesses of each measure of central tendency

• MODE: ignores all other info. about values except the most frequent one

• MEDIAN: ignores the LOCATION of scores above or below the midpoint

• MEAN: is the most sensitive to extreme values

Mode MeanMedian

Impacts of skewed distributions

Measures of Dispersion

Suburb A Suburb B24 2823 2522 2221 1920 16

Mean=22 Mean=22

Poverty Households (%) in 2 suburbs by tract

Less dispersion

more dispersion

Range

• Highest value minus the lowest value

• problem: ignores all the other values between the two extreme values

Interquartile range• Based on the quartiles (25th percentile and 75th

percentile of a distribution)

• Interquartile range = Q3-Q1

• Semi-interquartile range = (Q3-Q1)/2

• eliminates the effect of extreme scores by

excluding them

Graphic representation: Box Plot

374452N =

latin americaasiaafrica

Infa

nt m

ort

ality

ra

te

200

100

0

-100

101

132

Infant mortality

rate

Africa Asia Latin America

Variance

• A measure of dispersion based on the second property of the mean we discussed earlier:

2)( YY minimum

Step 1: Calculate the total sum of squares around the

meanY )( YY 2)( YY 10 -5 2512 -3 914 -1 115 0 016 +1 118 +3 920 +5 25

Mean=105/7=15 Sum = 70

Step 2: Take an average of this total variation

1/)( 22 nYYsWhy n-1? Rather than simply n???

The normal procedure involves estimating variance for a population using data from a sample.

Samples, especially small samples, are less likely to include extreme scores in the population.

N-1 is used to compensate for this underestimate.

Step 3: Take the square root of variance

1/)( 2 nYYs

Purpose: expresses dispersion in the original units of measurement--not units of measurement squared

Like variance: the larger the value the greater the variability

Coefficient of Variation (V)

V = (standard deviation / mean)

Value: To allow you to make comparisons of dispersion across groups with very different mean values or across variables with very different measurement scales.

top related