descriptive statistics
DESCRIPTION
Quantitative MethodsTRANSCRIPT
Descriptive Statistics
Quantitative(variable)
Discrete (no. of customers, no of
claims)
Continuous (salary, price)
Qualitative(Attribute)
Ordinal (customer satisfaction, efficiency
of workers, bond rating)
Nominal (sex, nationality, eye color)
7/3/2013 2 Descriptive Statistics
Data
Primary
Secondary
Data
Time series (unemployment
rate, GDP)
Cross Sectional (queue length in
different SBI branches)
7/3/2013 3 Descriptive Statistics
Definition
Primary Data
• Collected from source directly
• Collected under the control and supervision of investigation
Secondary Data • Not collected by the
investigator • Derived from the other
sources
7/3/2013 4 Descriptive Statistics
• Interview Method
• Questionnaire Method
• Observation Method
Methods of collecting Primary Data
7/3/2013 5 Descriptive Statistics
Diagram Presentation
Diagram
Line (time series)
Simple Multiple
Bar
Vertical (time series)
Horizontal (cross
sectional Component Subdivided
Pie
7/3/2013 6 Descriptive Statistics
• When data are collected in original form,
they are called raw data.
• When the raw data is organized into a
frequency distribution, the frequency will
be the number of values in a specific class
of the distribution (grouped data).
7/3/2013 Descriptive Statistics 7
Data Table : Compressive Strength of 80 Aluminum Lithium Alloy
105 221 183 186 121 181 180 143
97 154 153 174 120 168 167 141
245 228 174 199 181 158 176 110
163 131 154 115 160 208 158 133
207 180 190 193 194 133 156 123
134 178 76 167 184 135 229 146
218 157 101 171 165 172 158 169
199 151 142 163 145 171 148 158
160 175 149 87 160 237 150 135
196 201 200 176 150 170 118 149 7/3/2013 Descriptive Statistics 8
Stem-And-Leaf Stem leaf frequency 7 6 1 8 7 1 9 7 1 10 5 1 2 11 5 0 8 3 12 1 0 3 3 13 4 1 3 5 3 5 6 14 2 9 5 8 3 1 6 9 8 15 4 7 1 3 4 0 8 8 6 8 0 8 12 16 3 0 7 3 0 5 0 8 7 9 10 17 8 5 4 4 1 6 2 1 0 6 10 18 0 3 6 1 4 1 0 7 19 9 6 0 9 3 4 6 20 7 1 0 8 4 21 8 1 22 1 8 9 3 23 7 1 24 5 1 7/3/2013 Descriptive Statistics 9
class width=
upper class boundary-lower class boundary
Terms Associated with a Grouped Frequency Distribution
7/3/2013 Descriptive Statistics 10
Class Mark or Mid-Value
class marks are the midpoints of the class
boundaries
Class mark=
1/2(upper class boundary+lower class boundary)
7/3/2013 Descriptive Statistics 11
FD=Class frequency/class width
It gives number of observations in a class of width one
Use- When class widths are not equal, frequency density is plotted on the y-axis to draw Histogram
Frequency Density
7/3/2013 Descriptive Statistics 12
RF=Class frequency/total frequency
Relative Frequency
7/3/2013 Descriptive Statistics 13
Visualizing Data
• The three most commonly used
graphs in research are:
• The histogram.
• The frequency polygon.
• The cumulative frequency graph or
ogive
7/3/2013 Descriptive Statistics 14
Characteristic Definition / Interpretation
Central Tendency Where are the data values concentrated?
What seem to be typical or middle data
values?
Key Characteristics
Dispersion How much variation is there in the data?
How spread out are the data values?
Are there unusual values?
Shape Are the data values distributed
symmetrically? Skewed? Sharply peaked?
Flat? Bimodal?
7/3/2013 Descriptive Statistics 15
Measure Formula Excel Formula Pro Con
Mean
(Raw
data)
=AVERAGE(Data)
Familiar and
uses all the
sample
information.
NA to
extreme
values and
open class
Measures
Mean
(Groupe
d data)
=AVERAGE(Data)
Familiar and
uses all the
sample
information.
NA to
extreme
values and
open class
7/3/2013 Descriptive Statistics
k
i
i
k
i
ii
f
fx
x
1
1
n
x
x
n
i
i 1
16
Measure Formula Excel Formula Pro Con
Median
Middle value
in sorted
array
=MEDIAN(Data)
Robust
when
extreme
data values
exist.
Statistical
procedure
s for
median
are
complex
Measures
Mode
Most
frequently
occurring
data value
=MODE(Data)
Useful for
attribute
data or
discrete
data with a
small range.
May not be
unique,
and is not
helpful for
continuous
data.
7/3/2013 Descriptive Statistics 17
• Statistic is descriptive measure derived from a sample (n items).
• Parameter is descriptive measure derived from a population (N items).
Population vs Sample Characteristics
7/3/2013 Descriptive Statistics 18
Calculation of Mean
k
i
ii
k
i
i
k
i
ii
k
i
i
n
i
i
N
i
i
fNfxN
meanSamplex
fNfxN
meanPopulation
dataGrouped
sizesamplenxn
meanSamplex
sizePopulationNxN
meanPopulation
dataRaw
11
11
1
1
; 1
; 1
:
; 1
; 1
:
7/3/2013 Descriptive Statistics 19
Seventy efficiency apartments were randomly sampled in a small college town. The monthly rent prices for these apartments are listed below.
Sample Mean
Example: Apartment Rents
7/3/2013 Descriptive Statistics 20
Sample Mean
34,356 490.80
70ix
xn
7/3/2013 Descriptive Statistics 21
• Consider the following n = 6 data values: 11 12 15 17 21 32
• What is the median?
M = (x3+x4)/2 = (15+17)/2 = 16
11 12 15 16 17 21 32
For even n, Median = / 2 ( / 2 1)
2
n nx x
n/2 = 6/2 = 3 and n/2+1 = 6/2 + 1 = 4
Calculation of Median (n is even)
7/3/2013 Descriptive Statistics 22
• Consider the following n = 7 data values: 11 12 15 17 21 32 38
• What is the median?
11 12 15 17 21 32 38
(n+1)/2 = 8/2 = 4
Calculation of Median (n is odd)
For odd n, Median = ( 1) / 2nx
7/3/2013 Descriptive Statistics 23
Trimmed Mean
It is obtained by deleting a percentage of the smallest and largest values from a data set and then computing the mean of the remaining values.
For example, the 5% trimmed mean is obtained by removing the smallest 5% and the largest 5% of the data values and then computing the mean of the remaining values.
Another measure, sometimes used when extreme values are present, is the trimmed mean.
7/3/2013 Descriptive Statistics 24
• A bimodal distribution refers to the shape of the histogram rather than the mode of the raw data.
• Occurs when dissimilar populations are combined in one sample. For example,
Mode
7/3/2013 Descriptive Statistics 25
• Percentiles are data that have been divided into 100 groups and how the data spread over an interval from smallest to largest value
• For example, you score in the 83rd percentile on a standardized test. That means that 83% of the test-takers scored below you.
• Deciles are data that have been divided into 10 groups.
• Quartiles are data that have been divided into 4 groups.
Percentiles and Quartiles
In general by pth order quantile or fractile (Zp ), we mean that p Proportion of the total observations lie below
7/3/2013 Descriptive Statistics 26
• Put p=1/4, 2/4, 3/4, get quartiles
• Put p=1/10,2/10, …, 9/10, get deciles
• Put p=1/100, 2/100, …, 99/100, get percentiles
Step 1. Sort the observations.
Step 2. Calculate np ; n=no of observations.
Percentiles and Quartiles
7/3/2013 Descriptive Statistics
Step 3: If np is not an integer, consider the next integer value as the position else take both the integer and the next integer as the positions; take their mean
27
Third Quartile
Third quartile = 75th percentile
np = (75/100)70 = 52.5 = 53
Third quartile = 525
7/3/2013 Descriptive Statistics 28
Dispersion
• Describes how similar a set of observations are to each other
or
the degree of deviation (spread) of a set of data from their central value
– In general, the more spread out a distribution is, the larger the measure of dispersion will be
7/3/2013 Descriptive Statistics 29
Measures of Dispersion
• There are five main measures of dispersion:
– Range
– Mean Deviation
– Mean squared deviation (variance)
– Root mean squared deviation (Standard Deviation)
– Inter-quartile range (IQR)
7/3/2013 Descriptive Statistics 30
Measure Formula Excel Formula Pro Con
Range xmax – xmin =MAX(Data)-
MIN(Data)
Easy to
calculate
Sensitive to
extreme data
values.
Measures
Mean
Deviation =ABS(expr)
Measures
deviation
accurately
Further
algebraic
treatment is
not possible
n
i
i xxn 1
1
7/3/2013 Descriptive Statistics 31
Measure Formula Excel Formula Pro Con
Populatio
n
Variance
=VARP(array) Important
measure
Overestim
ates the
error
Measures
Sample
Variance =VAR(array)
Important
measure
Overestim
ates the
error
N
i
ixN 1
22 )(1
n
i
i xxn
s1
22 )(1
1
7/3/2013 Descriptive Statistics 32
REMEMBER
2
1
2
1
22
11
1)(
1
1
xn
nfx
nfxx
ns
datagroupedFor
k
i
ii
k
i
ii
7/3/2013 Descriptive Statistics 33
Measure Formula Excel Formula Pro Con
Populatio
n
Standard
Deviation
=STDEVP(array
)
Best
measure
Measures
Sample
Standard
deviation
=STDEV(array) Best
measure
2
2ss
7/3/2013 Descriptive Statistics 34
Inter-quartile Range
• The inter-quartile range (IQR) is defined as the difference of the first and third quartiles divided by two
– The first quartile is the 25th percentile
– The third quartile is the 75th percentile
• IQR = (Q3 - Q1)
7/3/2013 Descriptive Statistics 35
When To Use the SIR
• It is the range for the middle 50% of the data
• The SIR is often used with skewed data as it is insensitive to the extreme scores
• The SIR is used with open end distribution
7/3/2013 Descriptive Statistics 36
Interquartile Range
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
Example: Apartment Rents
7/3/2013 Descriptive Statistics 37
Coefficient of Variation (CV)
• Relative measure (unit free) used for the purpose of comparison of variability when
(i) two variables of different units are compared
(ii) two variables of same unit with varying mean are compared
Relative Measure=absolute measure/avg. *100
100s
CVx
7/3/2013 Descriptive Statistics 38
54.74100 % 100 % 11.15%
490.80
s
x
2 2996.16 54.74s s
the standard deviation is about 11% of the mean
Variance
Standard Deviation
Coefficient of Variation
Sample Variance, Standard Deviation, And Coefficient of Variation
Example: Apartment Rents
n
i
i xxn
s1
22 16.2996)(1
1
7/3/2013 Descriptive Statistics 39
Skewness
• Skew is a measure of symmetry in the distribution of data
Positive Skew Negative Skew
Normal (skew = 0)
7/3/2013 Descriptive Statistics 40
Measure of Skew
• Skewness is a unit-free measure of shape of any frequency distribution.
• The coefficient compares two samples measured in different units or one sample with a known reference distribution (e.g., symmetric normal distribution).
• Calculate the sample’s skewness coefficient
7/3/2013 Descriptive Statistics 41
Nature of Skewness
• If , distribution has a positive skewness or is right skewed
• If , distribution has a negative skewness or is left skewed
• If , distribution is symmetrical
01 g
01 g
01 g
7/3/2013 Descriptive Statistics 42
• Kurtosis is the relative length of the tails and the degree of concentration in the center.
• Consider three kurtosis prototype shapes.
Kurtosis
7/3/2013 Descriptive Statistics 43
Kurtosis
• When the distribution is normally distributed, its kurtosis equals 3 and it is said to be mesokurtic
• When the distribution is less spread out than normal, its kurtosis is greater than 3 and it is said to be leptokurtic
• When the distribution is more spread out than normal, its kurtosis is less than 3 and it is said to be platykurtic
7/3/2013 Descriptive Statistics 44
The z-score is often called the standardized value.
It denotes the number of standard deviations a data value xi is from the mean. An observation’s z-score is a measure of the relative location of the observation in a data set.
z-Scores
s
xxz i
i
Excel’s STANDARDIZE function can be used to compute the z-score.
7/3/2013 Descriptive Statistics 45
425 490.80 1.20
54.74ix x
zs
z-Scores
Standardized Values for Apartment Rents
Example: Apartment Rents
7/3/2013 Descriptive Statistics 46
Chebyshev’s Theorem
At least (1 - 1/z2) of the items in any data set will be
within z standard deviations of the mean, where z is
any value greater than 1.
Chebyshev’s theorem requires z > 1, but z need not be an integer.
7/3/2013 Descriptive Statistics 47
At least of the data values must be
within of the mean.
75%
z = 2 standard deviations
Chebyshev’s Theorem
At least of the data values must be
within of the mean.
89%
z = 3 standard deviations
At least of the data values must be
within of the mean.
94%
z = 4 standard deviations
7/3/2013 Descriptive Statistics 48
Empirical Rule
For data having a bell-shaped distribution: of the values of a normal random variable
are within of its mean.
68.26%
+/- 1 standard deviation
of the values of a normal random variable
are within of its mean.
95.44%
+/- 2 standard deviations
of the values of a normal random variable
are within of its mean.
99.72%
+/- 3 standard deviations
7/3/2013 Descriptive Statistics 49
Empirical Rule
x
– 3 – 1
– 2
+ 1
+ 2
+ 3
68.26%
95.44%
99.72%
7/3/2013 Descriptive Statistics 50
Detecting Outliers
An outlier is an unusually small or unusually large value in a data set.
A data value with a z-score less than -3 or greater than +3 might be considered an outlier.
7/3/2013 Descriptive Statistics 51
Box Plot
A box plot is a graphical summary of to identify
outliers.
A key to the development of a box plot is the computation of the median and the quartiles Q1 and Q3.
7/3/2013 Descriptive Statistics 52
Box Plot
Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325
Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645
The lower limit is located 1.5(IQR) below Q1
The upper limit is located 1.5(IQR) above Q3.
There are no outliers (values less than 325 or greater than 645) in the apartment rent data.
Example: Apartment Rents
7/3/2013 Descriptive Statistics 53
Box Plot
• Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values inside the limits.
400 425 450 475 500 525 550 575 600 625
Smallest value inside limits = 425
Largest value inside limits = 615
Example: Apartment Rents
7/3/2013 Descriptive Statistics 54
Weighted Mean
When the mean is computed by giving each data value a weight that reflects its importance, it is referred to as a weighted mean.
In the computation of a grade point average (GPA), the weights are the number of credit hours earned for each grade.
When data vary in importance, the analyst must choose the weight that best reflects the importance of each value.
7/3/2013 55 Descriptive Statistics
Weighted Mean
i i
i
w xx
w
where:
xi = value of observation i
wi = weight for observation i
7/3/2013 56 Descriptive Statistics
Mean for Grouped Data
i if Mx
n
N
Mf ii
where:
fi = frequency of class i
Mi = midpoint of class i
Sample Data
Population Data
7/3/2013 57 Descriptive Statistics
Sample Mean for Grouped Data
Example: Apartment Rents
7/3/2013 58 Descriptive Statistics
Sample Mean for Grouped Data
This approximation
differs by $2.41 from
the actual sample
mean of $490.80.
34,525 493.21
70x
Example: Apartment Rents
7/3/2013 59 Descriptive Statistics
Variance for Grouped Data
sf M x
ni i2
2
1
( )
2
2
f M
Ni i( )
For sample data
For population data
7/3/2013 60 Descriptive Statistics
Sample Variance for Grouped Data
7/3/2013 61 Descriptive Statistics
3,017.89 54.94s
s2 = 208,234.29/(70 – 1) = 3,017.89
This approximation differs by only $.20
from the actual standard deviation of $54.74.
• Sample Variance
Sample Standard Deviation
Example: Apartment Rents
Sample Variance for Grouped Data
7/3/2013 62 Descriptive Statistics
ACKNOWLEDGEMENT
1) Statistics for Management by Levin & Rubin ( Prentice Hall )
2) Business Statistics by Aczel and Soundarpardian ( Pearson )
3) Business Statistics by Anderson, Sweeney & Williams ( Cengage )
4) Applied Statistics in Business & Economics by Doane ( McGraw-Hill )