chapter 3 descriptive statistics: numerical measuresfaculty.tamucc.edu/rcutshall/chapter_3a.ppt ·...

49
Chapter 3 Chapter 3 Descriptive Statistics: Numerical Descriptive Statistics: Numerical Measures Measures Measures of Location Measures of Location Measures of Variability Measures of Variability

Upload: truongdieu

Post on 28-May-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

Chapter 3Chapter 3 Descriptive Statistics: Numerical Descriptive Statistics: Numerical

MeasuresMeasures Measures of LocationMeasures of Location Measures of VariabilityMeasures of Variability

Measures of LocationMeasures of Location

If the measures are computedIf the measures are computed for data from a sample,for data from a sample,

they are called they are called sample statisticssample statistics..

If the measures are computedIf the measures are computed for data from a population,for data from a population,

they are called they are called population parameterspopulation parameters..

A sample statistic is referred toA sample statistic is referred toas the as the point estimatorpoint estimator of the of the

corresponding population parameter.corresponding population parameter.

MeanMean MedianMedian ModeMode PercentilesPercentiles QuartilesQuartiles

MeanMean

The The meanmean of a data set is the average of all of a data set is the average of all the data values.the data values.

The sample mean is the point estimator of The sample mean is the point estimator of the population mean the population mean . .

x

Sample Mean Sample Mean x

Number ofNumber ofobservationsobservationsin the samplein the sample

Sum of the valuesSum of the valuesof the of the nn observations observations

ixx

n

Population Mean Population Mean

Number ofNumber ofobservations inobservations inthe populationthe population

Sum of the valuesSum of the valuesof the of the NN observations observations

ix

N

Seventy efficiency apartmentsSeventy efficiency apartmentswere randomly sampled inwere randomly sampled ina small college town. Thea small college town. Themonthly rent prices formonthly rent prices forthese apartments are listedthese apartments are listedin ascending order on the next slide. in ascending order on the next slide.

Sample MeanSample Mean

Example: Apartment RentsExample: Apartment Rents

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Sample MeanSample Mean

Sample MeanSample Mean

34,356 490.8070ix

xn

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

MedianMedian

Whenever a data set has extreme values, the medianWhenever a data set has extreme values, the median is the preferred measure of central location.is the preferred measure of central location.

A few extremely large incomes or property valuesA few extremely large incomes or property values can inflate the mean.can inflate the mean.

The median is the measure of location most oftenThe median is the measure of location most often reported for annual income and property value data.reported for annual income and property value data.

The The medianmedian of a data set is the value in the middle of a data set is the value in the middle when the data items are arranged in ascending order.when the data items are arranged in ascending order.

MedianMedian

1212 1414 1919 2626 27271818 2727

For an For an odd numberodd number of observations: of observations:

in ascending orderin ascending order

2626 1818 2727 1212 1414 2727 1919 7 observations7 observations

the median is the middle value.the median is the middle value.

Median = 19Median = 19

1212 1414 1919 2626 27271818 2727

MedianMedian

For an For an even numbereven number of observations: of observations:

in ascending orderin ascending order

2626 1818 2727 1212 1414 2727 3030 8 observations8 observations

the median is the average of the middle two values.the median is the average of the middle two values.

Median = (19 + 26)/2 = 22.5Median = (19 + 26)/2 = 22.5

1919

3030

MedianMedian

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Averaging the 35th and 36th data values:Averaging the 35th and 36th data values:Median = (475 + 475)/2 = 475Median = (475 + 475)/2 = 475

ModeMode

The The modemode of a data set is the value that occurs with of a data set is the value that occurs with greatest frequency.greatest frequency. The greatest frequency can occur at two or moreThe greatest frequency can occur at two or more different values.different values. If the data have exactly two modes, the data areIf the data have exactly two modes, the data are bimodalbimodal.. If the data have more than two modes, the data areIf the data have more than two modes, the data are multimodalmultimodal..

ModeMode

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

450 occurred most frequently (7 times)450 occurred most frequently (7 times)Mode = 450Mode = 450

PercentilesPercentiles

A percentile provides information about how theA percentile provides information about how the data are spread over the interval from the smallestdata are spread over the interval from the smallest value to the largest value.value to the largest value. Admission test scores for colleges and universitiesAdmission test scores for colleges and universities are frequently reported in terms of percentiles.are frequently reported in terms of percentiles.

The The ppth percentileth percentile of a data set is a value such of a data set is a value such that at least that at least pp percent of the items take on this percent of the items take on this value or less and at least (100 - value or less and at least (100 - pp) percent of ) percent of the items take on this value or more.the items take on this value or more.

PercentilesPercentiles

PercentilesPercentiles

Arrange the data in ascending order.Arrange the data in ascending order.

Compute index Compute index ii, the position of the , the position of the ppth percentile.th percentile.ii = ( = (pp/100)/100)nn

If If ii is not an integer, round up. The is not an integer, round up. The pp th percentileth percentile is the value in the is the value in the ii th position.th position.

If If ii is an integer, the is an integer, the pp th percentile is the averageth percentile is the average of the values in positionsof the values in positions i i and and ii +1.+1.

9090thth Percentile Percentile

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

ii = ( = (pp/100)/100)nn = (90/100)70 = 63 = (90/100)70 = 63Averaging the 63rd and 64th data values:Averaging the 63rd and 64th data values:

90th Percentile = (580 + 590)/2 = 58590th Percentile = (580 + 590)/2 = 585

9090thth Percentile Percentile

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

““At least 90%At least 90% of the itemsof the items

take on a valuetake on a value of 585 or less.”of 585 or less.”

““At least 10%At least 10% of the itemsof the items

take on a valuetake on a value of 585 or more.”of 585 or more.”

63/70 = .9 or 90%63/70 = .9 or 90% 7/70 = .1 or 10%7/70 = .1 or 10%

QuartilesQuartiles

Quartiles are specific percentiles.Quartiles are specific percentiles. First Quartile = 25th PercentileFirst Quartile = 25th Percentile

Second Quartile = 50th Percentile = MedianSecond Quartile = 50th Percentile = Median Third Quartile = 75th PercentileThird Quartile = 75th Percentile

Third QuartileThird Quartile

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Third quartile = 75th percentileThird quartile = 75th percentilei i = (= (pp/100)/100)nn = (75/100)70 = 52.5 = 53 = (75/100)70 = 52.5 = 53

Third quartile = 525Third quartile = 525

Measures of VariabilityMeasures of Variability

It is often desirable to consider measures of variabilityIt is often desirable to consider measures of variability (dispersion), as well as measures of location.(dispersion), as well as measures of location. For example, in choosing supplier A or supplier B weFor example, in choosing supplier A or supplier B we might consider not only the average delivery time formight consider not only the average delivery time for each, but also the variability in delivery time for each.each, but also the variability in delivery time for each.

Measures of VariabilityMeasures of Variability

RangeRange Interquartile RangeInterquartile Range VarianceVariance Standard DeviationStandard Deviation Coefficient of VariationCoefficient of Variation

RangeRange

The The rangerange of a data set is the difference between the of a data set is the difference between the largest and smallest data values.largest and smallest data values. It is the It is the simplest measuresimplest measure of variability. of variability. It is It is very sensitivevery sensitive to the smallest and largest data to the smallest and largest data values.values.

RangeRange

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Range = largest value - smallest valueRange = largest value - smallest valueRange = 615 - 425 = 190Range = 615 - 425 = 190

Interquartile RangeInterquartile Range

The The interquartile rangeinterquartile range of a data set is the difference of a data set is the difference between the third quartile and the first quartile.between the third quartile and the first quartile. It is the range for the It is the range for the middle 50%middle 50% of the data. of the data. It overcomes the sensitivity to extreme data values.It overcomes the sensitivity to extreme data values.

Interquartile RangeInterquartile Range

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

3rd Quartile (3rd Quartile (QQ3) = 5253) = 5251st Quartile (1st Quartile (QQ1) = 4451) = 445

Interquartile Range = Interquartile Range = QQ3 - 3 - QQ1 = 525 - 445 = 801 = 525 - 445 = 80

The The variancevariance is a measure of variability that utilizes is a measure of variability that utilizes all the data.all the data.

VarianceVariance

It is based on the difference between the value ofIt is based on the difference between the value of each observation (each observation (xxii) and the mean ( for a sample,) and the mean ( for a sample, for a population).for a population).

x

VarianceVariance

The variance is computed as follows:The variance is computed as follows:

The variance is the The variance is the average of the squaredaverage of the squared differencesdifferences between each data value and the mean. between each data value and the mean.

for afor asamplesample

for afor apopulationpopulation

22

( )xNis

xi xn

22

1

( )

Standard DeviationStandard Deviation

The The standard deviationstandard deviation of a data set is the positive of a data set is the positive square root of the variance.square root of the variance.

It is measured in the It is measured in the same units as the datasame units as the data, making, making it more easily interpreted than the variance.it more easily interpreted than the variance.

The standard deviation is computed as follows:The standard deviation is computed as follows:

for afor asamplesample

for afor apopulationpopulation

Standard DeviationStandard Deviation

s s 2 2

The coefficient of variation is computed as follows:The coefficient of variation is computed as follows:

Coefficient of VariationCoefficient of Variation

100 %sx

The The coefficient of variationcoefficient of variation indicates how large the indicates how large the standard deviation is in relation to the mean.standard deviation is in relation to the mean.

for afor asamplesample

for afor apopulationpopulation

100 %

Descriptive Statistics: Numerical Descriptive Statistics: Numerical MeasuresMeasures

Measures of Distribution Shape, Relative Measures of Distribution Shape, Relative Location, and Detecting OutliersLocation, and Detecting Outliers

Measures of Distribution Shape,Measures of Distribution Shape,Relative Location, and Detecting OutliersRelative Location, and Detecting Outliers

Distribution ShapeDistribution Shape z-Scoresz-Scores Detecting OutliersDetecting Outliers

Distribution Shape: SkewnessDistribution Shape: Skewness

An important measure of the shape of a An important measure of the shape of a distribution is called distribution is called skewnessskewness..

The formula for computing skewness for a data The formula for computing skewness for a data set is somewhat complex.set is somewhat complex.

Skewness can be easily computed using Skewness can be easily computed using statistical software.statistical software.

Distribution Shape: SkewnessDistribution Shape: Skewness

Symmetric (not skewed)Symmetric (not skewed)• Skewness is zero.Skewness is zero.• Mean and median are equal.Mean and median are equal.Re

lativ

e Fr

eque

ncy

.05

.10

.15

.20

.25

.30

.35

0

Skewness = Skewness = 0 0

Distribution Shape: SkewnessDistribution Shape: Skewness

Moderately Skewed LeftModerately Skewed Left• Skewness is negative.Skewness is negative.• Mean will usually be less than the median.Mean will usually be less than the median.Re

lativ

e Fr

eque

ncy

.05

.10

.15

.20

.25

.30

.35

0

Skewness = Skewness = .31 .31

Distribution Shape: SkewnessDistribution Shape: Skewness

Moderately Skewed RightModerately Skewed Right• Skewness is positive.Skewness is positive.• Mean will usually be more than the median.Mean will usually be more than the median.Re

lativ

e Fr

eque

ncy

.05

.10

.15

.20

.25

.30

.35

0

Skewness = .31 Skewness = .31

Distribution Shape: SkewnessDistribution Shape: Skewness

Highly Skewed RightHighly Skewed Right• Skewness is positive (often above 1.0).Skewness is positive (often above 1.0).• Mean will usually be more than the median.Mean will usually be more than the median.

Rela

tive

Freq

uenc

y

.05

.10

.15

.20

.25

.30

.35

0

Skewness = 1.25 Skewness = 1.25

Seventy efficiency apartmentsSeventy efficiency apartmentswere randomly sampled inwere randomly sampled ina small college town. Thea small college town. Themonthly rent prices formonthly rent prices forthese apartments are listedthese apartments are listedin ascending order on the next slide. in ascending order on the next slide.

Distribution Shape: SkewnessDistribution Shape: Skewness

Example: Apartment RentsExample: Apartment Rents

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Distribution Shape: SkewnessDistribution Shape: Skewness

Rela

tive

Freq

uenc

y

.05

.10

.15

.20

.25

.30

.35

0

Skewness = .92 Skewness = .92

Distribution Shape: SkewnessDistribution Shape: Skewness

The The z-scorez-score is often called the standardized value. is often called the standardized value.

It denotes the number of standard deviations a dataIt denotes the number of standard deviations a data value value xxii is from the mean. is from the mean.

z-Scoresz-Scores

z x xsii

z-Scoresz-Scores

A data value less than the sample mean will have aA data value less than the sample mean will have a z-score less than zero.z-score less than zero. A data value greater than the sample mean will haveA data value greater than the sample mean will have a z-score greater than zero.a z-score greater than zero. A data value equal to the sample mean will have aA data value equal to the sample mean will have a z-score of zero.z-score of zero.

An observation’s z-score is a measure of the relativeAn observation’s z-score is a measure of the relative location of the observation in a data set.location of the observation in a data set.

z-Score of Smallest Value (425)z-Score of Smallest Value (425)425 490.80 1.2054.74

ix xzs

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.350.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.451.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

zz-Scores-Scores

Standardized Values for Apartment RentsStandardized Values for Apartment Rents

Empirical RuleEmpirical Rule

For data having a bell-shaped distribution:For data having a bell-shaped distribution:

of the values of a normal random variableof the values of a normal random variable are within of its mean.are within of its mean.68.26%68.26%

+/- 1 standard deviation+/- 1 standard deviation

of the values of a normal random variableof the values of a normal random variable are within of its mean.are within of its mean.95.44%95.44%

+/- 2 standard deviations+/- 2 standard deviations

of the values of a normal random variableof the values of a normal random variable are within of its mean.are within of its mean.99.72%99.72%

+/- 3 standard deviations+/- 3 standard deviations

Empirical RuleEmpirical Rule

xx – – 33 – – 11

– – 22 + 1+ 1

+ 2+ 2 + 3+ 3

68.26%68.26%95.44%95.44%99.72%99.72%

Detecting OutliersDetecting Outliers

An An outlieroutlier is an unusually small or unusually large is an unusually small or unusually large value in a data set.value in a data set. A data value with a z-score less than -3 or greaterA data value with a z-score less than -3 or greater than +3 might be considered an outlier.than +3 might be considered an outlier. It might be:It might be:

• an incorrectly recorded data valuean incorrectly recorded data value• a data value that was incorrectly included in thea data value that was incorrectly included in the data setdata set• a correctly recorded data value that belongs ina correctly recorded data value that belongs in the data setthe data set

Detecting OutliersDetecting Outliers

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.350.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.451.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

The most extreme z-scores are -1.20 and 2.27The most extreme z-scores are -1.20 and 2.27 Using |Using |zz| | >> 3 as the criterion for an outlier, there are 3 as the criterion for an outlier, there are no outliers in this data set.no outliers in this data set.

Standardized Values for Apartment RentsStandardized Values for Apartment Rents