hawkes learning systems math courseware specialists describing data from one variable chapter 4...

56
HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved.

Upload: lorena-heath

Post on 11-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Chapter 4

Copyright © 2010 by Hawkes Learning

Systems/Quant Systems, Inc.

All rights reserved.

Page 2: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

HAWKES LEARNING SYSTEMS

math courseware specialists

Ch 4. Describing Data From One Variable

4.1 Measures of Location

Describing Data from One Variable

Sections 4.1-4.3a Measures of Location

Objectives:

• To calculate the mean, median, and mode.• To determine the most appropriate measure of center.

Page 3: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

HAWKES LEARNING SYSTEMS

math courseware specialists

Measures of Location:

• If we think about a data set as a group of data values that cluster around some central value, then the central value provides a focal point for the set, a location of sorts.

• Unfortunately, the notion of central value is a vague concept, which is as much defined by the way it is measured as by the notion itself.

• There are several statistical measures that are used to define the notion of center: the arithmetic mean, trimmed mean, median, and mode.

Describing Data from One Variable

Section 4.1 Measures of Location

Page 4: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

HAWKES LEARNING SYSTEMS

math courseware specialists

The Arithmetic Mean:

• Suppose there are n observations in a data set, consisting of the observations ; then the arithmetic mean is

• The mean is what we typically call the “average” of a data set.

• To calculate the mean, simply add all the values and divide by the total number in the data set.

• Mean should only be used for quantitative data.

• Outliers have a dramatic effect on the mean value.

1 2, ,..., nx x x 1 2

1... .nx x x

n

Describing Data from One Variable

Section 4.1 Measures of Location

Page 5: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

HAWKES LEARNING SYSTEMS

math courseware specialists

The Arithmetic Mean:

• If we use mathematical notation, the formula can be simplified to where is the data value in the data set and (pronounced sigma) is a mathematical notation for adding values.

• There are two symbols that are associated with mean:

• Here refers to the size of the sample and refers to the size of

the population. Otherwise, the calculations are made in precisely the same way.

ix

n ixthi

1 2

1... nx x x x

n the , andsample mean

1 2

1... nx x x

N the .population mean

n N

Describing Data from One Variable

Section 4.1 Measures of Location

Page 6: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

HAWKES LEARNING SYSTEMS

math courseware specialists

Example:

Calculate the sample mean of the following heights in inches.63, 68, 71, 67, 63, 72, 66, 67, 70

Solution:

When calculating the mean, round to one more decimal place than what is given in the data.

607

9

Describing Data from One Variable

Section 4.1 Measures of Location

Page 7: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

HAWKES LEARNING SYSTEMS

math courseware specialists

Deviation:

• Given some point A and a data point x, then x – A represents how far x deviates from A. This difference is also called a deviation.

• The table below shows the deviations from the mean for the following sample data values: 4, 10, 7, 15. The mean of this data set is 9.

Notice that the sum of the deviations is zero. This illustrates why the mean is a measure of central tendency. If we calculate the deviations about any other value the sum of the deviations will not equal zero.

1x = 4+10+7+15 = 9.

4

Data Valuexi

Deviations from the mean

(xi – 9)

4   – 5 10  17  – 2

15  6 i 9 = 0x

Describing Data from One Variable

Section 4.1 Measures of Location

Page 8: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

HAWKES LEARNING SYSTEMS

math courseware specialists

The Median:

• The median of a set of data values is the middle value in an ordered array. The same number of values is on either side of the median value.

Arrange the data in

ascending order.

Count the number of

values in the data

Median is the sum of the two middle values in the

data divided by two.

Median is the middle value in the data.

Count is e

ven

Count is odd

Describing Data from One Variable

Section 4.1 Measures of Location

Page 9: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

a. 15 16 11 22 19 10 17 22

Calculate the median of the following sets of data.

Solution:

10 11 15 16 17 19 22 22

b. 2.6 3.3 5.0 1.8 0.7 2.2 4.1 6.1 6.7

Solution:

0.7 1.8 2.2 2.6 3.3 4.1 5.0 6.1 6.7

16+17=

216.5

Example:

Describing Data from One Variable

Section 4.1 Measures of Location

Page 10: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

HAWKES LEARNING SYSTEMS

math courseware specialists

The Trimmed Mean:

• The trimmed mean ignores an equal percentage of the highest and lowest values in calculating the mean.

For calculating 10% trimmed mean, arrange

the data in ascending order

Delete the lowest 10%

of the values

Delete the highest 10% of the values

Calculate the arithmetic mean of the remaining

80% of the values.

Describing Data from One Variable

Section 4.1 Measures of Location

Page 11: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Consider the following data:

16 18 20 21 23 23 24 32 36 42

mean = 25.5 median = 23Find the 10% trimmed mean.

Since there are 10 observations, removing the highest 10% and lowest 10% means only removing one observation from each end of the data.

18+20+21+23+23+24+32+3610% trimmed mean = 8

=24.625

Example:

Describing Data from One Variable

Section 4.1 Measures of Location

Solution:

Page 12: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

HAWKES LEARNING SYSTEMS

math courseware specialists

Resistant Measures:

• Statistical measures which are not affected by outliers are said to be resistant.

• The mean is not a resistant measure.• The trimmed mean is a resistant measure.

Describing Data from One Variable

Section 4.1 Measures of Location

Page 13: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

HAWKES LEARNING SYSTEMS

math courseware specialists

The Mode:

• The mode of a data set is the most frequently occurring value.

• The mode is the only measure of centralness that can be applied to nominal data.

• When a data set has two modes it is said to be bimodal.

• When the data set has more than two modes it is said to be multimodal.

Describing Data from One Variable

Section 4.1 Measures of Location

Page 14: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

a. 63 68 71 67 63 72 66 67 70

Calculate the mode of each data set.

b. 51 77 54 51 68 70 54 65 51

c. 1 5 7 3 2 0 4 6

Example:

Solution:

There are two modes: 63 and 67. The data set is bimodal.

Solution:

51 occurs three times. The mode is 51.

Solution:

Each value appears only once. There is no mode.

Describing Data from One Variable

Section 4.1 Measures of Location

Page 15: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

• The shape of the data determines how the mean, median, and mode are related.

• For a bell-shaped distribution, the mean, median, and mode are identical.

Describing Data from One Variable

Section 4.1 Measures of Location

The Relationship between the Mean and Median:

Page 16: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

• Not all data produce distributions which follow a bell-shaped curve. • If the distribution of the data has a long tail to the right, it is said to be

skewed to the right, or positively skewed. • Conversely, if the distribution has a long tail on the left, the data is

said to be skewed to the left, or negatively skewed.

If the data is positively skewed, the median will be smaller than the mean.

If the data is negatively skewed, the median will be larger than the mean.

Describing Data from One Variable

Section 4.1 Measures of Location

Skewed Distributions:

Page 17: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.2 Selecting a Measure of Location

Selecting a Measure of Location:

• The objective of using descriptive statistics is to provide measures which convey useful summary information about the data.

• When selecting a statistic to represent the central value of a data set, the first question involves what type of data is being analyzed.

• The arithmetic mean is frequently, but not always, the most reasonable measure of centralness.

Page 18: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Measure of

Location

notsensitive

verysensitive

mean

median

mode

t-mean

Measure of

Location

Applicable Level of Measurement

Qualitative Quantitative

nominal ordinal interval ratio

mean

median

mode

t-mean

To the right is a table that defines the applicable levels of measurement for each measure of location.

To the left is a table that defines the sensitivity to outliers for each measure of location.

Selecting a Measure of Location:

Describing Data from One Variable

Section 4.2 Selecting a Measure of Location

Page 19: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.2 Selecting a Measure of Location

Selecting a Measure of Location:

• The mean and median are the same value when the data is symmetrical.

• When the data is nominal or ordinal, the mean should not be calculated.

• When the data is at least interval and there are no outliers the mean is a reasonable choice.

• When the data is at most ordinal, then the median is the best choice.

• The median is a good measure of central tendency since it is not sensitive to outliers.

• The median can be applied to all levels of measurement except nominal.

• The mode can be applied to all levels of data, but is not very useful for quantitative data.

• If the data is nominal, there is only one choice, the mode.

Page 20: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.2 Selecting a Measure of Location

Time Series Data and Measures of Centralness:

• The graph below shows the average gas price over a number of years. In this non-stationary time series, the central value of the process is trending upward.

• One way to capture this movement is with a moving average.

Page 21: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.2 Selecting a Measure of Location

Moving Average:

• A moving average is obtained by adding consecutive observations for a number of periods and dividing the result by the number of periods included in the average.

• The table below shows the average US gas price from 1991 to 2002 along with the 2 and 3 period moving averages.

YearAverage US Gas Price

2 Period Moving Average

3 Period Moving Average

YearAverage US Gas Price

2 Period Moving Average

3 Period Moving Average

1991 1.09 1997 1.18 1.195 1.167

1992 1.10 1.095 1998 1.01 1.095 1.333

1993 1.07 1.085 1.087 1999 1.14 1.075 1.110

1994 1.08 1.075 1.083 2000 1.49 1.315 1.213

1995 1.11 1.095 1.087 2001 1.38 1.435 1.337

1996 1.21 1.160 1.133 2002 1.34 1.360 1.403

1.09+1.10=1.095.2

• The two-period moving average for 1992 averages the time series in 1991 and 1992:

Page 22: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.2 Selecting a Measure of Location

Moving Average:

• The chart below displays the time series and the two and three-period moving averages.

• Notice that both of the averages follow the time series quite closely.

Page 23: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

HAWKES LEARNING SYSTEMS

math courseware specialists

Ch 4. Describing Data From One Variable

4.1 Measures of Location

Describing Data from One Variable

Sections 4.1-4.3b Measures of Dispersion

Objective:

• To compute the range, variance, and standard deviation.

Page 24: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.3 Measures of Dispersion

Measuring Variation:

• Many of the good measures of variation use the concept of deviation from the mean.

• If the mean is a focal point or base, use it as a common basis from which to measure variation.

• The distance that a point is from its mean is called the deviation from the mean.

• The sum of the positive deviations equals the sum of the absolute values of the negative deviations.

• The deviations will always sum to zero.

• Many of the variability measures average the deviations in some form.

Page 25: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Data set: 3, 12, 20, 15, 0Mean = 10

Data Values

Deviations from the meandata – mean = deviation

3 3 – 10 =

12 12 – 10 =

20 20 – 10 =

15 15 – 10 =

0 0 – 10 =

– 7

2

10

5– 10

Example:

A data set and its deviations from the mean are calculated in the table below. Notice that the sum of the deviations is zero.

Describing Data from One Variable

Section 4.3 Measures of Dispersion

Page 26: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.3 Measures of Dispersion

Mean Absolute Deviation:

• The sample mean absolute deviation (MAD) is

• Computes the average distance from the mean of a data set.• If data set A has a larger deviation than B, then it is reasonable to

believe that data set A has more variability than data set B.• Intuitive measure of variation.• Theoretical development has been hampered due to the

difficulty that absolute values pose to calculus.• Sensitive to outliers and not a resistant measure.

. i -MAD =x x

n

Page 27: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Suppose six people participated in a 1000 meter run. Their times, measured in minutes, are given below. The mean time is 8.333 minutes. Calculate the mean absolute deviation.

Time in min.

DeviationAbsoluteDeviation

% oftotal

4

10

9

11

9

7

11.334Mean Absolute Deviation = =1.889 minutes.6

4 – 8.333 = – 4.33310 – 8.333 = 1.667 9 – 8.333 = 0.66711 – 8.333 = 2.667

9 – 8.333 = 0.6677 – 8.333 = – 1.333

4.3331.6670.6672.667

0.6671.333

11.334

38.2314.715.88

23.53

5.8811.77

100.00Total4.333+1.667+0.667+2.667+0.667+1.333 =

4.333 100=38.2311.334

Describing Data from One Variable

Section 4.3 Measures of Dispersion

Example:

Page 28: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.3 Measures of Dispersion

Variance and Standard Deviation:

• Standard deviation and variance are the most common measures of variability.

• The standard deviation and variance also provide numerical measures of how the data varies around the mean.

• If the data is tightly packed around the mean, the standard deviation and variance will be relatively small.

• If the data is widely dispersed about the mean, the standard deviation and variance will be relatively large.

Page 29: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.3 Measures of Dispersion

Variance:

• The variance of a data set containing the complete set of population data is given by:

and is called the population variance.

• The variance of a data set containing sample data is given by:

and is called the sample variance.

22 ( )

ix

N

22 ( )

1

ix x

sn

Page 30: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Given the following times in minutes of 6 persons running the 1000 meter course, compute the sample variance. The sample mean is 8.333.

4, 10, 9, 11, 9, 7

Data Deviations SquaredDeviations % of total

4

10

9

11

9

7

4 – 8.333 = – 4.333

10 – 8.333 = 1.667

9 – 8.333 = 0.667

11 – 8.333 = 2.667

9 – 8.333 = 0.667

7 – 8.333 = – 1.333

18.7749

2.7789

0.4449

7.1129

0.4449

1.7769

31.33

59.93

8.87

1.42

22.70

1.42

5.67

100.00Total

2 31.33= = =6.266 squared minutes.51

ix xs

n

Example:

Describing Data from One Variable

Section 4.3 Measures of Dispersion

Page 31: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.3 Measures of Dispersion

Standard Deviation:

• The standard deviation is the square root of the variance.

• There are two measures of variance, so there will be two standard deviations.

• The sample standard deviation

• The population standard deviation

• It is important to remember the symbols above since standard deviation is a fundamental statistical concept.

2=s s

2

Page 32: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.3 Measures of Dispersion

Standard Deviation:

• Standard deviation is the square root of the average squared deviation.

• It can also be used to measure how far the data values are from the mean.

• Relatively few data values will be more than two deviation units from the mean.

• Like the variance, the standard deviation is sensitive to outliers.

• The presence of outliers tarnishes the interpretation of the standard deviation as a typical deviation.

Page 33: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.3 Measures of Dispersion

Range:

• The range is the difference between the largest and smallest data values.

Example:

Calculate the range of the following data set.

4, 6, 16, 9, 24, 8, 0, 12, 1

Solution:

The largest value is 24 and the smallest value is 0.

Range = 24 – 0 = 24.

Page 34: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.4 Measures of Relative Position

Objectives:

• Determine the percentiles and locations of specific data points.

• Find the quartiles of the data.

• Determine the z-score as a measure of relative position.

Page 35: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.4 Measures of Relative Position

Pth Percentile:

• Given a set of data x1, x2,…,xn, the Pth percentile is a value say, X, such that at least P percent of the data is less than or equal to X and at least (100 – P) percent of the data is greater than or equal to X.

• The most often used measure of relative position is the percentile.

Page 36: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.4 Measures of Relative Position

Pth Percentile:

To determine the Pth percentile:• Form an ordered array by placing the data in order from

smallest to largest

• To find the location of the Pth percentile in the ordered array, let

where n is the number of observations in the ordered data.

• If is not an integer, then round to the next greatest integer.

• If is an integer, then average the data value in the location with the data value in the location.

• Remember, is not the percentile, is the location of the percentile in the ordered array.

100

P

n

1

Page 37: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Arrange the data in

ascending order.

To find the Pth percentile in the ordered data, calculate,

where n is the number of observations in the ordered data.

Is an integer?

Round up to next

greatest integer.

Average the data value in the location

with the data value in the location

Find the data value in the

location.

1

th

100

Pn

Yes

No

Determining the Pth Percentile Flow Chart:

Describing Data from One Variable

Section 4.4 Measures of Relative Position

Page 38: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.4 Measures of Relative Position

Example:

Find the 50th percentile for the following data set.3, 5, 0, 1, 9, 2, 7

Solution:

Since the location is not an integer, the value is rounded up to 4.

50

7 = 3.5100

0, 1, 2, 3, 5, 7, 9

Thus, the fourth observation in the ordered array would be the median.

The median value (which is the 50th percentile) equals 3.

Page 39: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.4 Measures of Relative Position

Example:

Find the 50th percentile for the following data set.3, 5, 0, 1, 9, 2, 7, 6

Solution:

Since the location is an integer, we average the 4th value and the 5th value of the ordered array.

The 50th percentile for this data set is 4.

50

8 = 4100

0, 1, 2, 3, 5,6, 7, 9

3+5 8= =42 2

Page 40: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.4 Measures of Relative Position

Percentile:

• The percentile of some data value x is given by:

100 xx number of data values percentile oftotal number of data values

Page 41: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Find the percentile of 45 for the following data set.

67, 45, 63, 58, 35, 54, 27, 66, 21, 48

The values less than or equal to 45 are:

21, 27, 35, 45, 48, 54, 58, 63, 66, 67

So the number of values less than or equal to 45 is 4.

4

percentile of 45 = 100 = 4 10 = 40.10

Describing Data from One Variable

Section 4.4 Measures of Relative Position

Example:

Solution:

Page 42: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.4 Measures of Relative Position

Quartiles:

• The 25th, 50th, and 75th percentiles are known as quartiles and are denoted as Q1, Q2, and Q3.

• Quartiles serve as markers to divide the data.

• Q1 separates the lowest 25 percent.

• Q2 represents the median (50th percentile).

• Q3 marks the beginning of the top 25 percent of the data.

• Since quartiles are nothing more than percentiles, we construct them in the same way as before.

Page 43: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Find Q1, Q2, and Q3 for the following data set of test scores.

50, 50, 62, 75, 77, 82, 86, 87, 88, 88

25

10 = 2.5100

Q1th rd = 25 percentile = 3 data value = 62.

= = = Q2th 77+82

250 percentile 79.5.

= = = Q3th th75 percentile 8 data value 87.

Describing Data from One Variable

Section 4.4 Measures of Relative Position

Example:

Solution:

50

10 = 5100

75

10 = 7.5100

Page 44: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Describing Data from One Variable

Section 4.4 Measures of Relative Position

Interquartile Range:

• The interquartile range, which describes the range of the middle fifty percent of the data, is given by

Interquartile range = Q3 – Q1.

• For the previous example the interquartile range is 87 – 62 = 25.

• A data point is considered an outlier if it is 1.5 times the interquartile range above the 75th percentile or 1.5 times the interquartile range below the 25th percentile.

Page 45: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

0 10 20 30 40 50 60 70 80 90 100 110 120 130

• An important use of quartiles is the construction of box plots.• Box plots are graphical summaries of data which looks like a box.• It provides an alternative method to the histogram for displaying data.• A box plot is a graphical summary of central tendency, the spread, the

skewness, and the potential existence of outliers in the data.• Below is a box plot of the test scores data set.

• The plot is constructed from five summary measures: • largest data value• smallest data value• 25th percentile• 75th percentile • median

Describing Data from One Variable

Section 4.4 Measures of Relative Position

Box Plots:

Page 46: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

12, 50, 62, 75, 77, 82, 86, 87, 88, 126

Q1 = 62, Q2 = 79.5, Q3 = 87, and interquartile range = 25

Larger than 75th percentile + 1.5 times the interquartile range = 124.5

62 1.5 25 = 24.5

Smaller than 25th percentile – 1.5 times the interquartile range = 24.5

87+1.5 25 =124.5

Find the outliers in this new data set of test scores.

The outliers of this data set are 12 and 126.

Describing Data from One Variable

Section 4.4 Measures of Relative Position

Example:

Solution:

Page 47: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

• The z-score transforms the data value into the number of standard deviations that value is from the mean.

xz

• Describing the number of standard deviations is a fundamental concept of statistics.

• It is used as a standardization technique.• If the z-score is negative, the value is less than the mean. • If the z-score is positive, the value is greater than the mean.• The z-score is unit free measure.

mean

standard deviation

Remember:

Describing Data from One Variable

Section 4.4 Measures of Relative Position

Z-Scores:

Page 48: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Course MeanStandard Deviation

Biology 74 10

Psychology 82 11

Suppose you scored an 86 on your biology test and a 94 on your psychology test. The mean and standard deviation of the two tests are given to the right.

What are the z-scores for your two tests? On which test did you perform relatively better?

The z-score for the biology test is:

The z-score for the psychology test is:

z

86 74= =1.2.

10

z

94 82= =1.09.

11

Even though the raw score on the psychology test is larger than the raw score on the biology test, the performance on the biology test was slightly better.

Describing Data from One Variable

Section 4.4 Measures of Relative Position

Example:

Solution:

Page 49: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

HAWKES LEARNING SYSTEMS

math courseware specialists

• To calculate the coefficient of variation and use it to compare the variation of different data sets.

• To calculate the mean, variance, and standard deviation of grouped data.

• To use the empirical rule and Chebyshev’s Theorem to describe the variability of data.

Describing Data from One Variable

Sections 4.5-4.10 Applying the Standard Deviation

Objectives:

Page 50: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

One sigma rule: about 68% of the data should lie within one standard deviation of the mean.

A deviation of more than one sigma is to be expected once every three observations.

If the distribution is bell-shaped:

Two sigma rule: about 95% of the data should lie within two standard deviations of the mean.

A deviation of more than two sigma is to be expected about once every twenty observations.

Three sigma rule: about 99.7% of the data should lie within three standard deviations of the mean.

A deviation of more than three sigma is to be expected about once every 333 observations, slightly less than 0.3% of the time.

Describing Data from One Variable

Section 4.5 Using the Standard Deviation

Empirical Rule:

Page 51: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

• The proportion of any data set lying within standard deviations of the mean is at least

2 .k

k

11 , for 1

k

• = 2: At least (or 75%) of the data values lie

within 2 standard deviations of the mean, for any data set.

k 2

1 31 =

2 4

• = 3: At least (or 88.9%) of the data values lie

within 3 standard deviations of the mean, for any data set.

Describing Data from One Variable

Section 4.5 Using the Standard Deviation

Chebyshev’s Theorem:

k 2

1 81 =

3 9

Page 52: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

• The coefficient of variation compares the variation in data sets.• For sample data:

• For a population:

• The coefficient of variation standardizes the variation measure.

Describing Data from One Variable

Section 4.8 The Coefficient of Variation

Coefficient of Variation:

% s

CVx

100

%

CV 100

Page 53: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

• Finding the mean of grouped data involves finding the midpoint of each of the classes in the frequency distribution and then weighting each of these midpoints by the number of observations in the class. Let

• For a population the mean of grouped data is given by

• If the grouped data represent sample observations the mean is given by

Describing Data from One Variable

Section 4.9 Analyzing Grouped Data

Finding the Mean of Grouped Data:

.i if M

N

.i if Mx

n

i

i

i

f

N N f

M

n

i

i

th

th

number of observations in the group,

the total number of observations in all classes, ,

midpoint of the class, and

the number of observations in the sample.

Page 54: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

• Let

• The population variance of grouped data is given by the expression

• The sample variance is given by

Describing Data from One Variable

Section 4.9 Analyzing Grouped Data

Finding the Variance of Grouped Data:

i

i

i

f

N N f

M

n

i

i

th

th

number of observations in the group,

the total number of observations in all classes, ,

midpoint of the class, and

the number of observations in the sample.

2

2 222 .

i i

i ii i i i

f Mf M f M f MN

N N N

2

2

2 .1

i i

i i

f Mf M

nsn

Page 55: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

• A proportion measures the fraction of a group that possesses some characteristic.

• To calculate the proportion, simply count the number in the group that possess the characteristic and divide the count by the number in the group. Let

ˆ

X

N

n

Xp

NX

pn

number that possess the characteristic

number in the population

number in the sample, then

the population proportion, and

the sample proportion.

• The symbol is pronounced p-hat.p̂

Describing Data from One Variable

Section 4.10 Proportions

Proportions:

Page 56: HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Chapter 4 Copyright © 2010 by Hawkes Learning Systems/Quant Systems,

HAWKES LEARNING SYSTEMS

math courseware specialists

Suppose your statistics class is composed of 48 students of which 4 are left-handed. What proportion of the class is left-handed? What proportion of the class is right-handed?

=X

pN

4

.08348

Then .083 is the proportion of people in the class that are left-handed.

Xp

N

44.917

48

Then .917 is the proportion of people in the class that are right-handed.

Example:

Describing Data from One Variable

Section 4.10 Proportions

Solution: