chapter three averages and variation

Post on 01-Jan-2016

45 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College. Chapter Three Averages and Variation. Measures of Central Tendency. Mode Median Mean. The Mode. the value or property that occurs most frequently in the data. Find the mode:. - PowerPoint PPT Presentation

TRANSCRIPT

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1

Understandable StatisticsSeventh Edition

By Brase and BrasePrepared by: Lynn Smith

Gloucester County College

Chapter Three

Averages and Variation

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 2

Measures of Central Tendency

• Mode

• Median

• Mean

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 3

The Mode

the value or property that occurs most frequently in the data

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 4

Find the mode:

6, 7, 2, 3, 4, 6, 2, 6

The mode is 6.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 5

Find the mode:

6, 7, 2, 3, 4, 5, 9, 8

There is no mode for this data.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 6

The Median

the central value of an ordered distribution

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 7

To find the median of raw data:

• Order the data from smallest to largest.

• For an odd number of data values, the

median is the middle value.

• For an even number of data values, the

median is found by dividing the sum of

the two middle values by two.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 8

Find the median:

Data: 5, 2, 7, 1, 4, 3, 2

Rearrange: 1, 2, 2, 3, 4, 5, 7

The median is 3.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 9

Find the median:

Data: 31, 57, 12, 22, 43, 50

Rearrange: 12, 22, 31, 43, 50, 57

The median is the average of the middle two values =

372

4331

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 10

The Mean

The mean of a collection of data is found by:• summing all the entries• dividing by the number of entries

entriesofnumberentriesallofsum

mean

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 11

Find the mean:

6, 7, 2, 3, 4, 5, 2, 8

6.4625.48

378

82543276mean

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 12

Sigma Notation

•The symbol means “sum the following.”

• is the Greek letter (capital) sigma.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 13

Notations for mean

Sample mean

“x bar”

Population mean

Greek letter (mu)x

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 14

Find the measure of center• Find the

mode:

2,3,4,1,2,3,4,5,2,5,5,8,9,9,10

• Find the median:

5,3,4,10,2,3,9,5,2,5,2,8,4,9,1

• Find the mean:

10, 9, 2, 9, 3, 8, 4, 5, 1, 5, 2, 2, 3, 5, 4

There are two modes: 2 and 5(bimodal data)

The median (middle): 4

The sum of the numbers : 72 divided by the count 15 is the mean = 4.8

These lists are all the same!Which is the best measure of center to report????

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 15

Number of entries in a set of data

• If the data represents a sample, the

number of entries = n.

• If the data represents an entire

population, the number of entries = N.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 16

Sample mean

nx

x

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 17

Population mean

N

x

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 18

Resistant Measure

a measure that is not influenced by extremely high or low data values

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 19

Which is less resistant?

• Mean• Median

The mean is less resistant. It can be made arbitrarily large by increasing the size of one value.

The median is more resistant. It will not be heavily influenced by large or small values.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 20

Midrange

a measure of center that is the midpoint of a set of data

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 21

Compute the midrange:

36,20,18,60,17,15,20,32,25,30

• Order the list from smallest to largest

15,17, 18, 20, 20, 25, 30, 32, 36,60• Midrange = 5.37

2

1560

2

minmax,

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 22

Trimmed Mean

a measure of center that is more resistant than the mean but is still

sensitive to specific data values

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 23

To calculate a (5 or 10%) trimmed mean

• Order the data from smallest to largest.• Delete the bottom 5 or 10% of the data.• Delete the same percent from the top of

the data.• Compute the mean of the remaining 80

or 90% of the data.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 24

Compute a 10% trimmed mean:

15, 17, 18, 20, 20, 25, 30, 32, 36, 60• Delete the top and bottom 10%

( one value for every 10 random variables)

• New data list:

17, 18, 20, 20, 25, 30, 32, 36• 10% trimmed mean =

8.248

198

n

x

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 25

Trimmed Mean Guidelines

1. For 5% trimmed mean:i. Data sets n=3-20 ; drop min and max

ii. Data sets n=21 – 40 ; drop lowest 2 and highest 2 values, etc.

2. For 10% trimmed mean:i. Data sets n=11-20 ; drop lowest 2 and highest 2

values

ii. Data sets n=21 – 30 ; drop lowest 3 and highest 3 values, etc.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 26

Weighted Meananother measure of center that is

more resistant than the mean but is still sensitive to specific data

frequencies

ORAverage calculated where some of

the numbers are assigned more importance or weight

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 27

Weighted Average

x. value data the ofweight the w

AverageWeighted

where

w

xw

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 28

To calculate a weighted mean

• Order the data from least frequent to most frequent.

• Multiply each value by its frequency or percentage.

• Add the products and divide by the total frequency or total percentage (100% or 1.00).

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 29

Compute a weighted mean:

Using our syllabus grading system, you have a test average of 77, quiz avg. of 91, homework avg. of 85, and classwork avg. of 100.

• 45% x 77• 25% x 91• 20% x 85• 10% x 100

• Weighted average=

4.84100

8440

n

x

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 30

Compute the Weighted Average:

• Midterm grade = 92• Term Paper grade = 80• Final exam grade = 88• Midterm weight = 25%• Term paper weight = 25%• Final exam weight = 50%

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 31

Compute the Weighted Average:

x w xw• Midterm 92 .25 23• Term Paper 80 .25 20• Final exam 88 .50 44

1.00 87

Average Weighted8700.1

87

w

xw

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 32

Mean of Grouped Data (Frequency Table)

• Make a frequency table• Compute the midpoint (x) for each class.• Count the number of entries in each class

(f).• Sum the f values to find n, the total

number of entries in the distribution.• Treat each entry of a class as if it falls at

the class midpoint.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 33

Calculation of the mean of grouped data

Calculation of the mean of grouped data

Ages: f

30 - 34 4

35 - 39 5

40 - 44 2

45 – 49 9

x (mdpt) 32

37

42

47

xf 128 185 84

423

xf = 820

f = 20

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 34

Mean of Grouped Data

f

xf

n

xfx

0.4120

820

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 35

When do I use that?• Use the mode or midrange when the data appears to

be uniform or unimodal but not symmetric; especially when there are extreme values to the left and right in the graph

• Use the median when the data appears to be skewed or bimodal and symmetric; especially when there are extreme values to the left or right in the graph

• Use the mean when the data appears to be somewhat symmetrical and unimodal; especially when the graph has very few extreme values

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 36

Measures of Variation

• Range

• Interquartile Range

• Standard Deviation (Variance)

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 37

The Range

the difference between the largest and smallest values of a

distribution

(measure of spread most closely associated with mode or midrange as the center)

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 38

Find the range:

10, 13, 17, 17, 18

The range = largest minus smallest

= 18 minus 10 = 8

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 39

Interquartile Range (IQR)

the difference between the “low middle” and “high middle” or the middle 50% of a

distribution

(measure of spread most closely associated with median as the center)

• Percentiles that divide the data into fourths• Q1 = 25th percentile

• Q2 = the median

• Q3 = 75th percentile

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 40

Quartiles

• Percentiles that divide the data into fourths

• Q1 = 25th percentile

• Q2 = the median

• Q3 = 75th percentile

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 41

Quartiles

Q1

Median = Q2

Q3

Inter-quartile range = IQR = Q3 — Q1

Low

est

valu

e

Hig

hes

t va

lue

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 42

Find the quartiles:

12 15 16 16 17 18 22 22

23 24 25 30 32 33 33 34

41 45 51

The data has been ordered.

The median is 24.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 43

Find the quartiles:

12 15 16 16 17 18 22 22

23 24 25 30 32 33 33 34

41 45 51

For the data below the median, the median is 17.5

17.5 is the first quartile.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 44

Find the quartiles:

12 15 16 16 17 18 22 22

23 24 25 30 32 33 33 34

41 45 51

For the data above the median, the median is 33.

33 is the third quartile.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 45

Find the interquartile range:

12 15 16 16 17 18 22 22

23 24 25 30 32 33 33 34

41 45 51

IQR = Q3 – Q1 = 33 – 17.5 = 15.5

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 46

The standard deviation

a measure of the average variation of the data entries from the mean

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 47

Standard deviation of a sample

1n

)xx(s

2

n = sample size

mean of the sample

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 48

To calculate standard deviation of a sample

• Calculate the mean of the sample.• Find the difference between each entry (x) and the

mean. These differences will add up to zero.• Square the deviations from the mean.• Sum the squares of the deviations from the

mean.• Divide the sum by (n 1) to get the variance.• Take the square root of the variance to get

the standard deviation.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 49

The Variance

the square of the standard deviation

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 50

Variance of a Sample

1n)xx(

s2

2

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 51

Find the standard deviation and variance

22, 26, 30x302622

2)x(x xx

4 04

16 016___3278 mean=

26

Sum = 0

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 52

1)( 2

2

nxx

s = 32 2 =16

The variance

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 53

The standard deviation

s = 416

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 54

x

4

5

5

7

4

2)x-(x xx

25

1

0

0

2

1

Find the mean, the standard deviation and variance

4, 4, 5, 5, 7

Find the mean, the standard deviation and variance

4, 4, 5, 5, 7

1

0

0

4

1 6mean = 5

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 55

The mean, the standard deviation and variance

Mean = 5

5.14

6Variance

22.15.1deviationdardtanS

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 56

Computation formula for sample standard

deviation:

nx

xSSwhere

1nSS

s

2

2

x

x

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 57

To find

Square the x values, then add.

2x

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 58

To find

Sum the x values, then square.

2)x(

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 59

Use the computing formulas to find s and s2

x

4

5

5

7

4

x2

16

25

25

49

16

25 131

n = 5

(Sx) 2 = 25 2 = 625

Sx2 = 131

SSx = 131 – 625/5 = 6

s2 = 6/(5 –1) = 1.5

s = 1.22

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 60

Population Mean and Standard Deviation

population the in values data ofnumber N

deviation standard population

mean population

2

where

N

xx

N

x

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 61

COEFFICIENT OF VARIATION:

a measurement of the relative variability (or consistency) of data

100or100x

sCV

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 62

CV is used to compare variability or

consistencyA sample of newborn infants had a mean weight of 6.2 pounds with a standard deviation of 1 pound.

A sample of three-month-old children had a mean weight of 10.5 pounds with a standard deviation of 1.5 pounds.

Which (newborns or 3-month-olds) are more variable in weight?

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 63

To compare variability, compare Coefficient of Variation

For newborns:

For 3-month-olds:

CV = 16%

CV = 14%

Higher CV: more variable

Lower CV: more consistent

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 64

Use Coefficient of Variation

To compare two groups of data,

to answer:

Which is more consistent?

Which is more variable?

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 65

CHEBYSHEV'S THEOREM

For any set of data and for any number k,

greater than one, the proportion of the

data that lies within k standard

deviations of the mean is at least:

2k1

1

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 66

CHEBYSHEV'S THEOREM for k = 2CHEBYSHEV'S THEOREM for k = 2

According to Chebyshev’s Theorem, at least what fraction of the data falls within “k” (k = 2) standard deviations of the mean?

At least

of the data falls within 2 standard deviations of the mean.

%7543

21

12

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 67

CHEBYSHEV'S THEOREM for k = 3CHEBYSHEV'S THEOREM for k = 3

According to Chebyshev’s Theorem, at least what fraction of the data falls within “k” (k = 3) standard deviations of the mean?

At least

of the data falls within 3 standard deviations of the mean.

%9.8898

31

12

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 68

CHEBYSHEV'S THEOREM for k =4CHEBYSHEV'S THEOREM for k =4

According to Chebyshev’s Theorem, at least what fraction of the data falls within “k” (k = 4) standard deviations of the mean?

At least

of the data falls within 4 standard deviations of the mean.

%8.931615

41

12

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 69

Using Chebyshev’s Theorem

A mathematics class completes an examination and it is found that the class mean is 77 and the standard deviation is 6.

According to Chebyshev's Theorem, between what two values would at least 75% of the grades be?

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 70

Mean = 77 Standard deviation = 6

At least 75% of the grades would be in the interval:

s2xtos2x

77 – 2(6) to 77 + 2(6)

77 – 12 to 77 + 12

65 to 89

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 71

Mean and Standard Deviation of Grouped Data

• Make a frequency table• Compute the midpoint (x) for each class.• Count the number of entries in each class

(f).• Sum the f values to find n, the total

number of entries in the distribution.• Treat each entry of a class as if it falls at

the class midpoint.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 72

Sample Mean for a Frequency Distribution

x = class midpoint

n

xfx

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 73

Sample Standard Deviation for a Frequency Distribution

1

)( 2

n

fxxs

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 74

Computation Formula for Standard Deviation for a Frequency Distribution

n

xffx

n

SSs x

22

xSS where

1

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 75

Calculation of the mean of grouped data

Calculation of the mean of grouped data

Ages: f

30 - 34 4

35 - 39 5

40 - 44 2

45 - 49 9

x 32

374247

xf 128

185

84

423

xf = 820

f = 20

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 76

Calculation of the standard deviation of grouped data

Calculation of the standard deviation of grouped data

Ages: f 30 – 34 4

35 - 39 5

40 - 44 2

45 - 49 9

x

32

37

42

47

x – mean – 9

– 4

1

6

Mean

(x – mean)2 81

16

1

36

(x – mean)2 f 324

80

2

324

f = 20

(x – mean)2 f = 730

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 77

Calculation of the standard deviation of grouped data

Calculation of the standard deviation of grouped data

f = n = 20

20.642.38

120

730

1

)( 2

n

fxxs

7302 xx

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 78

Computation Formula for Standard Deviation for a Frequency Distribution

n

xffx

n

SSs x

22

xSS where

1

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 79

Computation Formula for Standard Deviation

Computation Formula for Standard Deviation

f

4

5

2

9

x

32

37

42

47

xf 128

185

84

423

xf = 820

f = 20

x2f 4096

6845

3528

19881

x2f = 34350

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 80

Computation Formula for Standard Deviation for a Frequency Distribution

20.6120

730

1

73020

82034350

SS where

2

22

x

n

SSs

n

xffx

x

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 81

Percentiles

For any whole number P (between 1 and 99), the Pth percentile of a distribution is a value such that P% of the data fall at or below it.

The percent falling above the Pth percentile will be (100 – P)%.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 82

Percentiles

40% of data

Low

est

valu

e

Hig

hes

t va

lueP 40

60% of data

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 83

Quartiles

• Percentiles that divide the data into fourths

• Q1 = 25th percentile

• Q2 = the median

• Q3 = 75th percentile

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 84

Quartiles

Q1

Median = Q2

Q3

Inter-quartile range = IQR = Q3 — Q1

Low

est

valu

e

Hig

hes

t va

lue

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 85

Computing Quartiles

• Order the data from smallest to largest.• Find the median, the second quartile.• Find the median of the data falling below

Q2. This is the first quartile.

• Find the median of the data falling above Q2. This is the third quartile.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 86

Find the quartiles:

12 15 16 16 17 18 22 22

23 24 25 30 32 33 33 34

41 45 51

The data has been ordered.

The median is 24.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 87

Find the quartiles:

12 15 16 16 17 18 22 22

23 24 25 30 32 33 33 34

41 45 51

The data has been ordered.

The median is 24.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 88

Find the quartiles:

12 15 16 16 17 18 22 22

23 24 25 30 32 33 33 34

41 45 51

For the data below the median, the median is 17.

17 is the first quartile.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 89

Find the quartiles:

12 15 16 16 17 18 22 22

23 24 25 30 32 33 33 34

41 45 51

For the data above the median, the median is 33.

33 is the third quartile.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 90

Find the interquartile range:

12 15 16 16 17 18 22 22

23 24 25 30 32 33 33 34

41 45 51

IQR = Q3 – Q1 = 33 – 17 = 16

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 91

Five-Number Summary of Data

• Lowest value• First quartile• Median• Third quartile• Highest value

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 92

Box-and-Whisker Plot

a graphical presentation of the five-number summary of data

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 93

Making a Box-and-Whisker Plot

• Draw a vertical scale including the lowest and highest values.

• To the right of the scale, draw a box from Q1 to Q3.

• Draw a solid line through the box at the median.

• Draw lines (whiskers) from Q1 to the lowest and from Q3 to the highest values.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 94

Construct a Box-and-Whisker Plot:

12 15 16 16 17 18 22 22

23 24 25 30 32 33 33 34

41 45 51

Lowest = 12 Q1 = 17

median = 24 Q3 = 33

Highest = 51

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 95

Box-and-Whisker Plot

Lowest = 12

Q1 = 17

median = 24

Q3 = 33

Highest = 51

60 -

55 -

50 -

45 -

40 -

35 -

30 -

25 -

20 -

15 -

10 -

top related