describing data: 2. numerical summaries of data using measures of central tendency and dispersion

23
DESCRIBING DATA: 2

Post on 20-Dec-2015

229 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

DESCRIBING DATA: 2

Page 2: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Numerical summaries of data

using measures of central tendency

and dispersion

Page 3: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Central tendency--Mode

Major FAnthropology 97Economics 104Geography 57Political Science 110Sociology 82

Table 1. Undergraduate Majors

Page 4: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Bimodal Distributions

Major FAnthropology 97Economics 110Geography 57Political Science 110Sociology 82

Table 1. Undergraduate Majors

Page 5: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Mode for Grouped Frequency Distributions based on Interval DataMean dailytemp.

Place A(f)

Place B(f)

10-19.9 degrees 5 020-29.9 5 530-39.9 20 1040-49.9 30 1550-59.9 20 3060-69.9 20 40

Midpoint of the modal class interval

Page 6: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Median

• The point in the distribution above which and below which exactly half the observations lie (50th percentile)

• Calculation depends on whether the no. of observations is odd or even.

Page 7: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Distribution 1(n=5)

Distribution 2(n=6)

198 197179 193172 189167 187154 183

179

Median=

188

Page 8: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

MEDIAN for grouped frequency distributions based on interval data

Mean dailytemp. (f)

Cumulative(f)

10-19.9 degrees 5 520-29.9 5 1030-39.9 20 3040-49.9 30 6050-59.9 20 8060-69.9 20 100

Median = 40 + ((20/30) * 10) = 40 + 6.67 = 46.67

Page 9: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

ARITHMETIC MEAN

nyY i /)(

47/28

7/)7763311(

y

Page 10: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Mean for Grouped Data

Mean dailytemp. (f)

Midpoint ofinterval

F timesmidpoint

10-19.9degrees

5 15 75

20-29.9 5 25 12530-39.9 20 35 70040-49.9 30 45 135050-59.9 20 55 110060-69.9 20 65 1300Totals 100 4650

Mean = sum of weighted midpoints / n = 4650/100=46.5

Page 11: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Mean is the balancing point of the distribution

0 1 2 3 4 5 6 7 8 9

XX

X

X X

X

X

MEAN

Page 12: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Key Properties of the Mean

• Sum of the differences between the individual scores and the mean equals 0

• sum of the squared differences between the individual scores and the mean equals a minimum value.

0)( YY

2)( YY The minimum value

Page 13: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Weaknesses of each measure of central tendency

• MODE: ignores all other info. about values except the most frequent one

• MEDIAN: ignores the LOCATION of scores above or below the midpoint

• MEAN: is the most sensitive to extreme values

Page 14: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Mode MeanMedian

Impacts of skewed distributions

Page 15: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Measures of Dispersion

Suburb A Suburb B24 2823 2522 2221 1920 16

Mean=22 Mean=22

Poverty Households (%) in 2 suburbs by tract

Less dispersion

more dispersion

Page 16: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Range

• Highest value minus the lowest value

• problem: ignores all the other values between the two extreme values

Page 17: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Interquartile range• Based on the quartiles (25th percentile and 75th

percentile of a distribution)

• Interquartile range = Q3-Q1

• Semi-interquartile range = (Q3-Q1)/2

• eliminates the effect of extreme scores by

excluding them

Page 18: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Graphic representation: Box Plot

374452N =

latin americaasiaafrica

Infa

nt m

ort

ality

ra

te

200

100

0

-100

101

132

Infant mortality

rate

Africa Asia Latin America

Page 19: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Variance

• A measure of dispersion based on the second property of the mean we discussed earlier:

2)( YY minimum

Page 20: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Step 1: Calculate the total sum of squares around the

meanY )( YY 2)( YY 10 -5 2512 -3 914 -1 115 0 016 +1 118 +3 920 +5 25

Mean=105/7=15 Sum = 70

Page 21: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Step 2: Take an average of this total variation

1/)( 22 nYYsWhy n-1? Rather than simply n???

The normal procedure involves estimating variance for a population using data from a sample.

Samples, especially small samples, are less likely to include extreme scores in the population.

N-1 is used to compensate for this underestimate.

Page 22: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Step 3: Take the square root of variance

1/)( 2 nYYs

Purpose: expresses dispersion in the original units of measurement--not units of measurement squared

Like variance: the larger the value the greater the variability

Page 23: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion

Coefficient of Variation (V)

V = (standard deviation / mean)

Value: To allow you to make comparisons of dispersion across groups with very different mean values or across variables with very different measurement scales.