Download - Econ 325: Introduction to Empirical Economicsfaculty.arts.ubc.ca/hkasahara/Econ325/325_lecture01.pdf · Title: Microsoft PowerPoint - Lecture1.pptx Author: User Created Date: 9/18/2017

Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1

Econ 325: Introduction to Empirical Economics

Lecture 1

Describing Data: Numerical

Measures of Central Tendency

Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall

Central Tendency

Mean Median Mode

n

x

x

n

1ii∑

==

Overview

Midpoint of ranked values

Most frequently observed value

Arithmetic average

Ch. 2-2

2.1

Arithmetic Mean

� The arithmetic mean (mean) is the most common measure of central tendency

� For a population of N values:

� For a sample of size n:


Sample sizen

xxx

n

x

x n21

n

1ii

+++==

∑=

L Observed values

N

xxx

N

x

μ N21

N

1ii

+++==

∑=

L

Population size

Population values

Ch. 2-3

Arithmetic Mean

� The most common measure of central tendency

� Mean = sum of values divided by the number of values

� Affected by extreme values (outliers)


(continued)

0 1 2 3 4 5 6 7 8 9 10

Mean = 3

0 1 2 3 4 5 6 7 8 9 10

Mean = 4

35

15

5

54321==

++++4

5

20

5

104321==

++++

Ch. 2-4

Median

� In an ordered list, the median is the “middle” number (50% above, 50% below)

� Not affected by extreme values


0 1 2 3 4 5 6 7 8 9 10

Median = 3

0 1 2 3 4 5 6 7 8 9 10

Median = 3

Ch. 2-5

Mode

� Value that occurs most often

� Not affected by extreme values

� Used for either numerical or categorical data

� There may may be no mode

� There may be several modes


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

0 1 2 3 4 5 6

No Mode

Ch. 2-6

Which measure of location is the “best”?


� Mean is generally used, unless extreme values (outliers) exist . . .

� Then median is often used, since the median is not sensitive to extreme values.

� Example: Median home prices may be reported for a region – less sensitive to outliers

Ch. 2-7

Clicker Question

Q: In which graph, Mean < Median ?


C). Right-SkewedA). Left-Skewed B). Symmetric

Ch. 2-8

Clicker Question 1-1

� Answer: A). ``Left-Skewed’’ distribution.


Mean = MedianMean < Median Median < Mean

Right-SkewedLeft-Skewed Symmetric

Ch. 2-9


Geometric Mean

� Geometric mean

� Geometric mean rate of return

1/nn21

nn21g )xx(x)xx(xx ×××=×××= LL

1)x...x(xr 1/nn21g −×××=

Ch. 2-10



Suppose you invested $100 in stocks and, after 5 years, the value of stocks becomes $125 worth. What is the average annual compound rate of returns?

A). 5 %

B). 4.6 %

C). 5.4 %

Ch. 2-11


Clicker Question

Initial Investment: $100

After 5 years: $125

Question: the average annual rate of returns?

• 25 % divided by 5 years = 5%? This is Wrong!

• ``Compound Interest’’ over 5 years

$100 x (1+0.05)^5 = $127.63 > $125 after 5 years

• Answer is B

$100 x (1+r)^5 = $125 �

Ch. 2-12

Measures of Variability


Same center,

different variation

Variation

Variance Standard Deviation

Coefficient of Variation

Range Interquartile Range

� Measures of variation give information on the spread or variability of the data values.

Ch. 2-13

2.2

Range

� Simplest measure of variation

� Difference between the largest and the smallest observations:


Range = Xlargest – Xsmallest

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13

Example:

Ch. 2-14

Disadvantages of the Range

� Ignores the way in which data are distributed

� Sensitive to outliers


7 8 9 10 11 12

Range = 12 - 7 = 5

7 8 9 10 11 12

Range = 12 - 7 = 5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 5 - 1 = 4

Range = 120 - 1 = 119

Ch. 2-15

Interquartile Range


Median(Q2)

XmaximumX

minimum Q1 Q3

Example:

25% 25% 25% 25%

12 30 45 57 70

Interquartile range = 57 – 30 = 27

Ch. 2-16

STATA Example


Population Variance

� Average of squared deviations of values from the mean

� Population variance:


N

μ)(x

σ

N

1i

2i

2∑

=

−

=

Where = population mean

N = population size

xi = ith value of the variable x

μ

Ch. 2-18

Sample Variance

� Average (approximately) of squared deviations of values from the mean

� Sample variance:


1-n

)x(x

s

n

1i

2i

2∑

=

−

=

Where = arithmetic mean

n = sample size

Xi = ith value of the variable X

X

Ch. 2-19

Population Standard Deviation

� Most commonly used measure of variation

� Shows variation about the mean

� Has the same units as the original data

� Population standard deviation:


N

)(xN

1i

2

i

2

∑=

−

==

µ

σσ

�

�

Ch. 2-20

Sample Standard Deviation

� Most commonly used measure of variation

� Shows variation about the mean

� Has the same units as the original data

� Sample standard deviation:


1-n

)x(x

s

n

1i

2

i∑=

−

=

Ch. 2-21

Calculation Example:Sample Standard Deviation


Sample Data (xi) : 10 12 14 15 17 18 18 24

n = 8 Mean = x = 16

4.24267

126

18

16)(2416)(1416)(1216)(10

1n

)x(24)x(14)x(12)X(10s

2222

2222

==

−

−++−+−+−=

−

−++−+−+−=

L

L

A measure of the “average” scatter around the mean

Ch. 2-22

Measuring variation


Small standard deviation

Large standard deviation

Ch. 2-23

Comparing Standard Deviations


Mean = 15.5

s = 3.33811 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B

Data A

Mean = 15.5

s = 0.926

11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5

s = 4.570

Data C

Ch. 2-24

Advantages of Variance and Standard Deviation

� Each value in the data set is used in the calculation

� Values far from the mean are given extra weight

(because deviations from the mean are squared)


Coefficient of Variation

� Measures relative variation

� Always in percentage (%)

� Shows variation relative to mean

� Can be used to compare two or more sets of

data measured in different units


100%x

sCV ⋅

=

Ch. 2-26

Comparing Coefficient of Variation

� Stock A:

� Average price last year = $50

� Standard deviation = $5

� Stock B:

� Average price last year = $100

� Standard deviation = $5


Both stocks have the same standard deviation, but stock B is less variable relative to its price

10%100%$50

$5100%

x

sCVA =⋅=⋅

=

5%100%$100

$5100%

x

sCVB =⋅=⋅

=

Ch. 2-27

� If the data distribution is approximated by normal distribution, then the interval:

� contains about 68% of the values in

the population or the sample


The Empirical Rule

σµ �1±

μ

68%

1σμ±

Ch. 2-28

� contains about 95% of the values in

the population or the sample

� contains almost all (about 99.7%) of the values in the population or the sample


The Empirical Rule

2σμ ±

3σμ ±

3σμ±

99.7%95%

2σμ±

Ch. 2-29


� In one year, the average stock price of Google Inc. was $800 with the standard deviation equal to $100. In what interval, approximately 95% of the stock price of Google Inc. will be?

A). Between $400 and $1200

B). Between $600 and $1000

C). Between $700 and $900


Clicker Question

� Because contains about 95% of the values,

800 ± 2x100 = 800–200 and 800+200

contains about 95% of the Google stock price.

Answer: B). Between $600 and $1000


µ ± 2σ

Weighted Mean

� The weighted mean of a set of data is

� Where wi is the weight of the ith observation

and

� Use when data is already grouped into n classes, with wi values in the ith class


wi∑ =1

2.3

Example

� Consider a student with the scores of assignment (x1), midterm (x2), and final exam (x3) given by

x1 = 90, x2 = 90, and x3 = 46.

� The weights:

� w1 = 0.1, w2 =0.3, and w3 = 0.6.

� The final grade for this student is


wix

i

i=1

3

∑ = 0.1× 90+0.3× 90+0.6 × 46=63.6

Ch. 2-33

2.3

The Sample Covariance� The covariance measures the strength of the linear relationship

between two variables� The population covariance:

� The sample covariance:

� Only concerned with the strength of the relationship

� No causal effect is implied

� Depends on the unit of measurement


N

))(y(x

y),(xCov

N

1iyixi

xy

∑=

−−

==

µµ

σ

1n

)y)(yx(x

sy),(xCov

n

1iii

xy−

−−

==∑

=

Ch. 2-34

2.4

Interpreting Covariance

� Covariance between two variables:

Cov(x,y) > 0 x and y tend to move in the same direction

Cov(x,y) < 0 x and y tend to move in opposite directions

Cov(x,y) = 0 x and y are independent


Coefficient of Correlation

� Measures the relative strength of the linear relationship between two variables

� Population correlation coefficient:

� Sample correlation coefficient:


YX ss

y),(xCovr =

YXσσ

y),(xCovρ =

Ch. 2-36

Scatter Plots of Data with Various Correlation Coefficients


Y

X

Y

X

Y

X

Y

X

Y

X

r = -1 r = -.6 r = 0

r = +.3r = +1

Y

Xr = 0

Ch. 2-37

Example: HW and Final Grade

� r = .0.499

� There is a relatively

strong positive linear

relationship between

HW scores and

Final Grades

� Students who scored high on HW assignment tended to have high final grades


20

40

60

80

100

Fin

al G

rad

e

0 2 4 6 8 10Homework

Grade Fitted values

Correlation = 0.499

HW and Final Grade

Download - Econ 325: Introduction to Empirical Economicsfaculty.arts.ubc.ca/hkasahara/Econ325/325_lecture01.pdf · Title: Microsoft PowerPoint - Lecture1.pptx Author: User Created Date: 9/18/2017

Top Related