Download - DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion
DESCRIBING DATA: 2
Numerical summaries of data
using measures of central tendency
and dispersion
Central tendency--Mode
Major FAnthropology 97Economics 104Geography 57Political Science 110Sociology 82
Table 1. Undergraduate Majors
Bimodal Distributions
Major FAnthropology 97Economics 110Geography 57Political Science 110Sociology 82
Table 1. Undergraduate Majors
Mode for Grouped Frequency Distributions based on Interval DataMean dailytemp.
Place A(f)
Place B(f)
10-19.9 degrees 5 020-29.9 5 530-39.9 20 1040-49.9 30 1550-59.9 20 3060-69.9 20 40
Midpoint of the modal class interval
Median
• The point in the distribution above which and below which exactly half the observations lie (50th percentile)
• Calculation depends on whether the no. of observations is odd or even.
Distribution 1(n=5)
Distribution 2(n=6)
198 197179 193172 189167 187154 183
179
Median=
188
MEDIAN for grouped frequency distributions based on interval data
Mean dailytemp. (f)
Cumulative(f)
10-19.9 degrees 5 520-29.9 5 1030-39.9 20 3040-49.9 30 6050-59.9 20 8060-69.9 20 100
Median = 40 + ((20/30) * 10) = 40 + 6.67 = 46.67
ARITHMETIC MEAN
nyY i /)(
47/28
7/)7763311(
y
Mean for Grouped Data
Mean dailytemp. (f)
Midpoint ofinterval
F timesmidpoint
10-19.9degrees
5 15 75
20-29.9 5 25 12530-39.9 20 35 70040-49.9 30 45 135050-59.9 20 55 110060-69.9 20 65 1300Totals 100 4650
Mean = sum of weighted midpoints / n = 4650/100=46.5
Mean is the balancing point of the distribution
0 1 2 3 4 5 6 7 8 9
XX
X
X X
X
X
MEAN
Key Properties of the Mean
• Sum of the differences between the individual scores and the mean equals 0
• sum of the squared differences between the individual scores and the mean equals a minimum value.
0)( YY
2)( YY The minimum value
Weaknesses of each measure of central tendency
• MODE: ignores all other info. about values except the most frequent one
• MEDIAN: ignores the LOCATION of scores above or below the midpoint
• MEAN: is the most sensitive to extreme values
Mode MeanMedian
Impacts of skewed distributions
Measures of Dispersion
Suburb A Suburb B24 2823 2522 2221 1920 16
Mean=22 Mean=22
Poverty Households (%) in 2 suburbs by tract
Less dispersion
more dispersion
Range
• Highest value minus the lowest value
• problem: ignores all the other values between the two extreme values
Interquartile range• Based on the quartiles (25th percentile and 75th
percentile of a distribution)
• Interquartile range = Q3-Q1
• Semi-interquartile range = (Q3-Q1)/2
• eliminates the effect of extreme scores by
excluding them
Graphic representation: Box Plot
374452N =
latin americaasiaafrica
Infa
nt m
ort
ality
ra
te
200
100
0
-100
101
132
Infant mortality
rate
Africa Asia Latin America
Variance
• A measure of dispersion based on the second property of the mean we discussed earlier:
2)( YY minimum
Step 1: Calculate the total sum of squares around the
meanY )( YY 2)( YY 10 -5 2512 -3 914 -1 115 0 016 +1 118 +3 920 +5 25
Mean=105/7=15 Sum = 70
Step 2: Take an average of this total variation
1/)( 22 nYYsWhy n-1? Rather than simply n???
The normal procedure involves estimating variance for a population using data from a sample.
Samples, especially small samples, are less likely to include extreme scores in the population.
N-1 is used to compensate for this underestimate.
Step 3: Take the square root of variance
1/)( 2 nYYs
Purpose: expresses dispersion in the original units of measurement--not units of measurement squared
Like variance: the larger the value the greater the variability
Coefficient of Variation (V)
V = (standard deviation / mean)
Value: To allow you to make comparisons of dispersion across groups with very different mean values or across variables with very different measurement scales.