business statistics spring 2005 summarizing and describing numerical data
TRANSCRIPT
Business Statistics
Spring 2005
Summarizing and Describing Numerical Data
Topics•Measures of Central Tendency Mean, Median, Mode, Midrange, Midhinge•Quartile
•Measures of Variation The Range, Interquartile Range, Variance and Standard Deviation, Coefficient of variation•Shape Symmetric, Skewed, using Box-and-Whisker Plots
Numerical Data Properties
Central Tendency (Location)
Variation (Dispersion)
Shape
Measures of Central Tendency Measures of Central Tendency forfor
Ungrouped DataUngrouped Data
Raw Data
Summary Measures
Central Tendency
MeanMedian
Mode
Midrange
Quartile
Midhinge
Summary Measures
Variation
Variance
Standard Deviation
Coefficient of Variation
Range
Measures of Central Tendency
Central Tendency
Mean Median Mode
Midrange
Midhinge
n
xn
ii
1
Population MeanFor ungrouped data, the population mean is the sum of all
the population values divided by the total number of population values:
where µ stands for the population mean.
N is the total number of observations in the population.
X stands for a particular value.
indicates the operation of adding.
N
X
3-2
Population Mean ExampleParameter: a measurable characteristic of a population.
The Kane family owns four cars. The following is the mileage attained by each car: 56,000, 23,000, 42,000, and 73,000. Find the average miles covered by each car.
The mean is (56,000 + 23,000 + 42,000 + 73,000)/4 = 48,500
3-3
Sample Mean
For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values:
where X stands for the sample mean
n is the total number of values in the sample
3-4
n
xX
Return on Stock
1998
1997
1996
1995
1994
10%
8
12
2
8
17%
-2
16
1
8
Stock X Stock Y
40% 40%
Average Return
on Stock= 40 / 5 = 8%
The Mean (Arithmetic Average)
•It is the Arithmetic Average of data values:
•The Most Common Measure of Central Tendency
•Affected by Extreme Values (Outliers)
n
xn
1ii
n
xxx n 21
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 5 Mean = 6
xSample Mean
Properties of the Arithmetic Mean
Every set of interval-level and ratio-level data has a mean.
All the values are included in computing the mean.
A set of data has a unique mean.
The mean is affected by unusually large or small data values.
The mean is relatively reliable.
The arithmetic mean is the only measure of central tendency where the sum of the deviations of each value from the mean is zero.
3-6
EXAMPLE
Consider the set of values: 3, 8, and 4.
The mean is 5.
Illustrating the fifth property, (3-5) + (8-5) + (4-5) = -2 +3 -1 = 0. In other words,
( )X X 0
3-7
The Median
Median: The midpoint of the values after they have been ordered from the smallest to the largest, or the largest to the smallest. There are as many values above the median as below it in the data array.
Note: For an even set of numbers, the median will be the arithmetic average of the two middle numbers.
3-10
Position of Median in Sequence
Median
Positioning Pointn 1
2
The Median
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5 Median = 5
•Important Measure of Central Tendency
•In an ordered array, the median is the “middle” number.
•If n is odd, the median is the middle number.•If n is even, the median is the average of the 2
middle numbers.•Not Affected by Extreme Values
62
Properties of the Median• There is a unique median for each data set.
• It is not affected by extremely large or small valuesand is therefore a valuable measure of centraltendency when such values occur.
• It can be computed for ratio-level,interval-level, and ordinal-level data.
• It can be computed for an open-ended frequencydistribution if the median does not lie in an open-ended class.
• No arithmetic properties
The Mode
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
•A Measure of Central Tendency•Value that Occurs Most Often•Not Affected by Extreme Values•There May Not be a Mode•There May be Several Modes•Used for Either Numerical or Categorical Data
0 1 2 3 4 5 6
No Mode
Midrange
•A Measure of Central Tendency
•Average of Smallest and Largest
Observation:
•Affected by Extreme Value
2
xx smallestestl arg
Midrange
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Midrange = 5 Midrange = 5
Quartiles
• Not a Measure of Central Tendency• Split Ordered Data into 4 Quarters
• Position of i-th Quartile: position of point
25% 25% 25% 25%
Q1 Q2 Q3
Q i(n+1)i 4
Data in Ordered Array: 11 12 13 16 16 17 18 21 22
Position of Q1 = 2.50 Q1 =12.5= 1•(9 + 1)4
Quartiles
•See text page 107 for “rounding rules” for position of the i-th quartile
• Position (not value) of i-th Quartile:
25% 25% 25% 25%
Q1 Q2 Q3
Q i(n+1)i
4
Midhinge
• A Measure of Central Tendency
• The Middle point of 1st and 3rd Quarters
• Used to Overcome Extreme Values
Midhinge = 2
31 QQ
Data in Ordered Array: 11 12 13 16 16 17 18 21 22
Midhinge = 162
519512
231
Summary Measures
Central Tendency
MeanMedian
Mode
Midrange
Quartile
Midhinge
n
xn
ii
1
Summary Measures
Variation
Variance
Standard Deviation
Coefficient of Variation
Range
1n
xxs
2i2
Measures of Variation
Variation
Variance Standard Deviation Coefficient of Variation
PopulationVariance
Sample
Variance
PopulationStandardDeviationSample
Standard
Deviation
Range
Interquartile Range
100%
X
SCV
• Measure of Variation
• Difference Between Largest & Smallest Observations:
Range =
• Ignores How Data Are Distributed:
The Range
SmallestrgestLa xx
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5
Return on Stock
1998
1997
1996
1995
1994
10%
8
12
2
8
17%
-2
16
1
8
Stock X Stock Y
Range on Stock X = 12 - 2 = 10%
Range on Stock Y = 17 - (-2) = 19%
• Measure of Variation
• Also Known as Midspread: Spread in the Middle 50%
• Difference Between Third & First
Quartiles: Interquartile Range =
•
Interquartile Range
13 QQ Data in Ordered Array: 11 12 13 16 16 17 17 18 21
13 QQ = 17.5 - 12.5 = 5
• IQR = 75th percentile - 25th percentile
•The IQR is useful for checking for outliers
•Not Affected by Extreme Values
Interquartile Range
Data in Ordered Array: 11 12 13 16 16 17 17 18 21
13 QQ = 17.5 - 12.5 = 5
Variance & Variance & Standard DeviationStandard Deviation
Measures of Dispersion
Most Common Measures
Consider How Data Are Distributed
Show Variation About Mean (X or )
4 6 8 10 12
X = 8.3
•Important Measure of Variation
•Shows Variation About the Mean:
•For the Population:
•For the Sample:
Variance
N
X i
22
1
22
n
XXs i
For the Population: use N in the denominator.
For the Sample : use n - 1 in the denominator.
Population Variance
The population variance for ungrouped data is the arithmetic mean of the squared deviations from the population mean.
2
2
( )X
N
4-5
Population Variance EXAMPLE
The ages of the Dunn family are 2, 18, 34, and 42 years. What is the population variance?
X N/ /96 4 24
2 2 944 4 236 ( ) / /X N
2
2
( )X
N
x (x- (x-)2
2 24 -22 48418 24 -6 3634 24 10 10042 24 18 324
944
PopulationStandard Deviation
N
x
2)(
Population Standard Deviation EXAMPLE
The ages of the Dunn family are 2, 18, 34, and 42 years. What is the population variance?
X N/ /96 4 24
2( ) 944236
4
X
N
N
X 2)(
x (x- (x-)2
2 24 -22 48418 24 -6 3634 24 10 10042 24 18 324
944
•Most Important Measure of Variation
•Shows Variation About the Mean:
•For the Population:
•For the Sample:
Standard Deviation
N
X i
2
1
2
n
XXs i
For the Population: use N in the denominator.
For the Sample : use n - 1 in the denominator.
Sample Variance and Standard Deviation
The sample variance estimates the population
am variance. NOTE: important computation formriance estimates the population variance.
1
)(
S
1
)(
22
2
22
nnX
X
n
XXS
The sample standard deviation = 2ss
Example of Standard DeviationDeviation from Mean
Amount X (X - X) ( X - X )600 435 600 - 435 = 165 27,225 350 435 350 - 435 = -85 7,225 275 435 275 - 435 = -160 25,600 430 435 430 -435 = -5 25 520 435 520 - 435 = 85 7,225
0 67,300
( )X X
n
1s =
s == = = 129.7167 300
4
,16 825,
2
2
Example of Standard Deviation(Computational Version)
Amount(X) X (X - X) ( X - X ) X2
600 435 165 27,225 360000350 435 -85 7,225 122500275 435 -160 25,600 75625430 435 -5 25 184900520 435 85 7,225 270400
2175 67,300 1013425
2
2
2
1
xX
nn
s = = =
155
21751013425
2
129.71
Sample Standard Deviation
1
2
n
XX iNOTE: For the Sample : use n - 1 in the denominator.
Data: 10 12 14 15 17 18 18 24
s =
n = 8 Mean =16
18
)1624()1618(.....)1612()1610( 2222
= 4.2426
s
:X i
Interpretation and Uses of the Standard Deviation
Chebyshev’s theorem: For any set of observations, the minimum proportion of the values that lie within k standard deviations of the mean is at least 1 - 1/k2 where k is any constant greater than 1.
Multiply by 100% to get percentage of values within k standard deviations of the mean
4-14
Interpretation and Uses of the Standard Deviation
Empirical Rule: For any symmetrical, bell-shaped distribution, approximately 68% of the observations will lie within of the mean ( );approximately 95% of the observations will lie within of the mean ( ); approximately 99.7% will lie within of the mean ( ).
1
3
2
4-15
Comparing Standard Deviations
1
2
n
XX is =
= 4.2426
N
X i
2 = 3.9686
Value for the Standard Deviation is larger for data considered as a Sample.
Data : 10 12 14 15 17 18 18 24:X i
N= 8 Mean =16
Comparing Standard Deviations
Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5 s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5 s = 4.57
Data C
Coefficient of Variation
•Measure of Relative Variation
•Always a %
•Shows Variation Relative to Mean
•Used to Compare 2 or More Groups
•Formula ( for Sample):
100%
X
SCV
Comparing Coefficient of Variation
Stock A: Average Price last year = $50
Standard Deviation = $5
Stock B: Average Price last year = $100
Standard Deviation = $5
100%
X
SCV
Coefficient of Variation:
Stock A: CV = 10%
Stock B: CV = 5%
Shape
• Describes How Data Are Distributed
• Measures of Shape: Symmetric or skewed
Right-SkewedLeft-Skewed Symmetric
Mean = Median = ModeMean Median Mode Median MeanMode
Box-and-Whisker Plot
Graphical Display of Data Using5-Number Summary
Median
4 6 8 10 12
Q3Q1 XlargestXsmallest
Distribution Shape & Box-and-Whisker Plots
Right-SkewedLeft-Skewed Symmetric
Q1 Median Q3Q1 Median Q3 Q1
Median Q3
Summary• Discussed Measures of Central Tendency Mean, Median, Mode, Midrange, Midhinge
• Quartiles• Addressed Measures of Variation The Range, Interquartile Range, Variance, Standard Deviation, Coefficient of Variation• Determined Shape of Distributions
Symmetric, Skewed, Box-and-Whisker Plot
Mean = Median = ModeMean Median Mode Mode Median Mean
Unit 3 Lesson 1 (4.1) Numerical Methods for Describing Data 4.1: Describing the Center of a Data Set
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Describing Data: Numerical Measures Chapter 3