Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1
Econ 325: Introduction to Empirical Economics
Lecture 1
Describing Data: Numerical
Measures of Central Tendency
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Central Tendency
Mean Median Mode
n
x
x
n
1ii∑
==
Overview
Midpoint of ranked values
Most frequently observed value
Arithmetic average
Ch. 2-2
2.1
Arithmetic Mean
� The arithmetic mean (mean) is the most common measure of central tendency
� For a population of N values:
� For a sample of size n:
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Sample sizen
xxx
n
x
x n21
n
1ii
+++==
∑=
L Observed values
N
xxx
N
x
μ N21
N
1ii
+++==
∑=
L
Population size
Population values
Ch. 2-3
Arithmetic Mean
� The most common measure of central tendency
� Mean = sum of values divided by the number of values
� Affected by extreme values (outliers)
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
(continued)
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
0 1 2 3 4 5 6 7 8 9 10
Mean = 4
35
15
5
54321==
++++4
5
20
5
104321==
++++
Ch. 2-4
Median
� In an ordered list, the median is the “middle” number (50% above, 50% below)
� Not affected by extreme values
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
0 1 2 3 4 5 6 7 8 9 10
Median = 3
0 1 2 3 4 5 6 7 8 9 10
Median = 3
Ch. 2-5
Mode
� Value that occurs most often
� Not affected by extreme values
� Used for either numerical or categorical data
� There may may be no mode
� There may be several modes
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
Ch. 2-6
Which measure of location is the “best”?
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
� Mean is generally used, unless extreme values (outliers) exist . . .
� Then median is often used, since the median is not sensitive to extreme values.
� Example: Median home prices may be reported for a region – less sensitive to outliers
Ch. 2-7
Clicker Question
Q: In which graph, Mean < Median ?
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
C). Right-SkewedA). Left-Skewed B). Symmetric
Ch. 2-8
Clicker Question 1-1
� Answer: A). ``Left-Skewed’’ distribution.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Mean = MedianMean < Median Median < Mean
Right-SkewedLeft-Skewed Symmetric
Ch. 2-9
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Geometric Mean
� Geometric mean
� Geometric mean rate of return
1/nn21
nn21g )xx(x)xx(xx ×××=×××= LL
1)x...x(xr 1/nn21g −×××=
Ch. 2-10
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Clicker Question 1-2
Suppose you invested $100 in stocks and, after 5 years, the value of stocks becomes $125 worth. What is the average annual compound rate of returns?
A). 5 %
B). 4.6 %
C). 5.4 %
Ch. 2-11
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Clicker Question
Initial Investment: $100
After 5 years: $125
Question: the average annual rate of returns?
• 25 % divided by 5 years = 5%? This is Wrong!
• ``Compound Interest’’ over 5 years
$100 x (1+0.05)^5 = $127.63 > $125 after 5 years
• Answer is B
$100 x (1+r)^5 = $125 �
Ch. 2-12
Measures of Variability
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Same center,
different variation
Variation
Variance Standard Deviation
Coefficient of Variation
Range Interquartile Range
� Measures of variation give information on the spread or variability of the data values.
Ch. 2-13
2.2
Range
� Simplest measure of variation
� Difference between the largest and the smallest observations:
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Range = Xlargest – Xsmallest
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Example:
Ch. 2-14
Disadvantages of the Range
� Ignores the way in which data are distributed
� Sensitive to outliers
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 5 - 1 = 4
Range = 120 - 1 = 119
Ch. 2-15
Interquartile Range
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Median(Q2)
XmaximumX
minimum Q1 Q3
Example:
25% 25% 25% 25%
12 30 45 57 70
Interquartile range = 57 – 30 = 27
Ch. 2-16
STATA Example
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-17
Population Variance
� Average of squared deviations of values from the mean
� Population variance:
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
N
μ)(x
σ
N
1i
2i
2∑
=
−
=
Where = population mean
N = population size
xi = ith value of the variable x
μ
Ch. 2-18
Sample Variance
� Average (approximately) of squared deviations of values from the mean
� Sample variance:
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
1-n
)x(x
s
n
1i
2i
2∑
=
−
=
Where = arithmetic mean
n = sample size
Xi = ith value of the variable X
X
Ch. 2-19
Population Standard Deviation
� Most commonly used measure of variation
� Shows variation about the mean
� Has the same units as the original data
� Population standard deviation:
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
N
)(xN
1i
2
i
2
∑=
−
==
µ
σσ
�
�
Ch. 2-20
Sample Standard Deviation
� Most commonly used measure of variation
� Shows variation about the mean
� Has the same units as the original data
� Sample standard deviation:
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
1-n
)x(x
s
n
1i
2
i∑=
−
=
Ch. 2-21
Calculation Example:Sample Standard Deviation
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Sample Data (xi) : 10 12 14 15 17 18 18 24
n = 8 Mean = x = 16
4.24267
126
18
16)(2416)(1416)(1216)(10
1n
)x(24)x(14)x(12)X(10s
2222
2222
==
−
−++−+−+−=
−
−++−+−+−=
L
L
A measure of the “average” scatter around the mean
Ch. 2-22
Measuring variation
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Small standard deviation
Large standard deviation
Ch. 2-23
Comparing Standard Deviations
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Mean = 15.5
s = 3.33811 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5
s = 0.926
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 4.570
Data C
Ch. 2-24
Advantages of Variance and Standard Deviation
� Each value in the data set is used in the calculation
� Values far from the mean are given extra weight
(because deviations from the mean are squared)
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-25
Coefficient of Variation
� Measures relative variation
� Always in percentage (%)
� Shows variation relative to mean
� Can be used to compare two or more sets of
data measured in different units
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
100%x
sCV ⋅
=
Ch. 2-26
Comparing Coefficient of Variation
� Stock A:
� Average price last year = $50
� Standard deviation = $5
� Stock B:
� Average price last year = $100
� Standard deviation = $5
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Both stocks have the same standard deviation, but stock B is less variable relative to its price
10%100%$50
$5100%
x
sCVA =⋅=⋅
=
5%100%$100
$5100%
x
sCVB =⋅=⋅
=
Ch. 2-27
� If the data distribution is approximated by normal distribution, then the interval:
� contains about 68% of the values in
the population or the sample
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
The Empirical Rule
σµ �1±
μ
68%
1σμ±
Ch. 2-28
� contains about 95% of the values in
the population or the sample
� contains almost all (about 99.7%) of the values in the population or the sample
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
The Empirical Rule
2σμ ±
3σμ ±
3σμ±
99.7%95%
2σμ±
Ch. 2-29
Clicker Question 1-3
� In one year, the average stock price of Google Inc. was $800 with the standard deviation equal to $100. In what interval, approximately 95% of the stock price of Google Inc. will be?
A). Between $400 and $1200
B). Between $600 and $1000
C). Between $700 and $900
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-30
Clicker Question
� Because contains about 95% of the values,
800 ± 2x100 = 800–200 and 800+200
contains about 95% of the Google stock price.
Answer: B). Between $600 and $1000
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-31
µ ± 2σ
Weighted Mean
� The weighted mean of a set of data is
� Where wi is the weight of the ith observation
and
� Use when data is already grouped into n classes, with wi values in the ith class
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-32
wi∑ =1
2.3
Example
� Consider a student with the scores of assignment (x1), midterm (x2), and final exam (x3) given by
x1 = 90, x2 = 90, and x3 = 46.
� The weights:
� w1 = 0.1, w2 =0.3, and w3 = 0.6.
� The final grade for this student is
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
wix
i
i=1
3
∑ = 0.1× 90+0.3× 90+0.6 × 46=63.6
Ch. 2-33
2.3
The Sample Covariance� The covariance measures the strength of the linear relationship
between two variables� The population covariance:
� The sample covariance:
� Only concerned with the strength of the relationship
� No causal effect is implied
� Depends on the unit of measurement
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
N
))(y(x
y),(xCov
N
1iyixi
xy
∑=
−−
==
µµ
σ
1n
)y)(yx(x
sy),(xCov
n
1iii
xy−
−−
==∑
=
Ch. 2-34
2.4
Interpreting Covariance
� Covariance between two variables:
Cov(x,y) > 0 x and y tend to move in the same direction
Cov(x,y) < 0 x and y tend to move in opposite directions
Cov(x,y) = 0 x and y are independent
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-35
Coefficient of Correlation
� Measures the relative strength of the linear relationship between two variables
� Population correlation coefficient:
� Sample correlation coefficient:
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
YX ss
y),(xCovr =
YXσσ
y),(xCovρ =
Ch. 2-36
Scatter Plots of Data with Various Correlation Coefficients
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Y
X
Y
X
Y
X
Y
X
Y
X
r = -1 r = -.6 r = 0
r = +.3r = +1
Y
Xr = 0
Ch. 2-37
Example: HW and Final Grade
� r = .0.499
� There is a relatively
strong positive linear
relationship between
HW scores and
Final Grades
� Students who scored high on HW assignment tended to have high final grades
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-38
20
40
60
80
100
Fin
al G
rad
e
0 2 4 6 8 10Homework
Grade Fitted values
Correlation = 0.499
HW and Final Grade