icar- ifpri - basic statistics and econometric approaches lecture 3 - devesh roy

Basic Statistics and Econometric approaches

Devesh Roy

IFPRI-ICAR training

21st September, 2015

2

Why study Econometrics?

• Rare in economics (and many other areas without labs!) to have experimental data

• Need to use nonexperimental, or observational, data to make inferences

• Important to be able to apply economic theory to real world data

Elements of econometrics

• An empirical analysis uses data to test a theory or to estimate a relationship

• A formal economic model can be tested

• Theory may be ambiguous as to the effect of some policy change –can use econometrics to evaluate the program

4

Types of Data – Cross Sectional

• Cross-sectional data as a random sample

• Each observation is a new farmer, firm, etc. with information at a point in time

• If the data is not a random sample, we have a sample-selection problem

5

Types of Data – Panel

• Can pool random cross sections and treat similar to a normal cross section. Will just need to account for time differences.

• Can follow the same random individual observations over time –known as panel data or longitudinal data

6

Types of Data – Time Series

• Time series data has a separate observation for each time period – e.g. prices

• Since not a random sample, different problems to consider

• Trends and seasonality will be important

7

The Question of Causality

• Simply establishing a relationship between variables is not sufficient

• In economics and most of the work we strive to show that effect is causal

• If we truly control for enough other variables, then the estimated ceteris paribus effect can often be considered to be causal

• Can be difficult to establish causality

8

Example: Returns to Education (Wooldridge textbook example)

• A model of human capital investment implies getting more education should lead to higher earnings

• In the simplest case, this implies an equation like

ueducationEarnings 10

9

Example: (continued)

• The estimate of 1, is the return to education, but can it be considered causal?

• While the error term, u, includes other factors affecting earnings, want to control for as much as possible

• Some things are still unobserved, which can threaten establishing causality

Economics 20 - Prof. Anderson 10

Simple linear regression

• y = 0 + 1x + u, we typically refer to y as the• Dependent Variable, or

• Left-Hand Side Variable, or

• Explained Variable, or

• Regressand

11

Terminology, continued

• In the simple linear regression of y on x, we typically refer to x as the• Independent Variable, or

• Right-Hand Side Variable, or

• Explanatory Variable, or

• Regressor, or

• Covariate, or

• Control Variables

12

A Simple Assumption

• The average value of u, the error term, in the population is 0. That is,

• E(u) = 0

• This is not a restrictive assumption, since we can always use 0 to normalize E(u) to 0

13

Zero Conditional Mean (very important condition) • We need to make a crucial assumption about how u and x are related

• We want it to be the case that knowing something about x does not give us any information about u, so that they are completely unrelated. That is, that

• E(u|x) = E(u) = 0, which implies

• E(y|x) = 0 + 1x

14

..

x1 x2

E(y|x) as a linear function of x, where for any xthe distribution of y is centered about E(y|x)

E(y|x) = 0 + 1x

y

f(y)

15

Ordinary Least Squares

• Basic idea of regression is to estimate the population parameters from a sample

• Let {(xi,yi): i=1, …,n} denote a random sample of size n from the population

• For each observation in this sample, it will be the case that

• yi = 0 + 1xi + ui


.

..

.

y4

y1

y2

y3

x1 x2 x3 x4

}

}

{

{

u1

u2

u3

u4

x

y

Population regression line, sample data pointsand the associated error terms

E(y|x) = 0 + 1x

Gauss Markov assumptions

• Assumption 1- linearity in parameters- Population equation is linear in the (unknown) parameters.

• Assumption 2- Sample is random

• Assumption 3- There is variation in x

• Assumption 4- Zero conditional mean E(u/x)=0

18

Deriving OLS Estimates

• To derive the OLS estimates we need to realize that our main assumption of E(u|x) = E(u) = 0 also implies that

• Cov(x,u) = E(xu) = 0

• Why? Remember from basic probability that Cov(X,Y) = E(XY) –E(X)E(Y)


Deriving OLS continued

• We can write our 2 restrictions just in terms of x, y, 0 and 1 , since u= y – 0 – 1x

• E(y – 0 – 1x) = 0

• E[x(y – 0 – 1x)] = 0

• These are called moment restrictions

20

Deriving OLS using M.O.M.

• The method of moments approach to estimation implies imposing the population moment restrictions on the sample moments

• What does this mean? Recall that for E(X), the mean of a population distribution, a sample estimator of E(X) is simply the arithmetic mean of the sample

21

More Derivation of OLS

• We want to choose values of the parameters that will ensure that the sample versions of our moment restrictions are true

• The sample versions are as follows:

0ˆˆ

0ˆˆ

1

10

1

1

10

1

n

i

iii

n

i

ii

xyxn

xyn

22


• Given the definition of a sample mean, and properties of summation, we can rewrite the first condition as follows

xy

xy

10

10

ˆˆ

or

,ˆˆ

23


n

i

ii

n

i

i

n

i

ii

n

i

ii

n

i

iii

xxyyxx

xxxyyx

xxyyx

1

2

1

1

1

1

1

1

11

ˆ

ˆ

0ˆˆ

24

So the OLS estimated slope is

0 that provided

ˆ

1

2

1

2

11

n

i

i

n

i

i

n

i

ii

xx

xx

yyxx

25

Summary of OLS slope estimate

• The slope estimate is the sample covariance between x and y divided by the sample variance of x

• If x and y are positively correlated, the slope will be positive

• If x and y are negatively correlated, the slope will be negative

• Only need x to vary in our sample

26

More OLS

• Intuitively, OLS is fitting a line through the sample points such that the sum of squared residuals is as small as possible, hence the term least squares

• The residual, û, is an estimate of the error term, u, and is the difference between the fitted line (sample regression function) and the sample point


.

..

.

y4

y1

y2

y3

x1 x2 x3 x4

}

}

{

{

û1

û2

û3

û4

x

y

Sample regression line, sample data pointsand the associated estimated error terms

xy 10ˆˆˆ

28

Minimizing the sum of least squares

• Given the intuitive idea of fitting a line, we can set up a formal minimization problem

• That is, we want to choose our parameters such that we minimize the following:

n

i

ii

n

i

i xyu1

2

10

1

2 ˆˆˆ

29

Alternate approach, continued

• If one uses calculus to solve the minimization problem for the two parameters you obtain the following first order conditions, which are the same as we obtained before, multiplied by n

0ˆˆ

0ˆˆ

1

10

1

10

n

i

iii

n

i

ii

xyx

xy

30

Algebraic Properties of OLS

• The sum of the OLS residuals is zero

• Thus, the sample average of the OLS residuals is zero as well

• The sample covariance between the regressors and the OLS residuals is zero

• The OLS regression line always goes through the mean of the sample

31

Algebraic Properties (precise)

xy

ux

n

u

u

n

i

ii

n

i

in

i

i

10

1

1

1

ˆˆ

0ˆ

0

ˆ

thus,and 0ˆ


More terminology

SSR SSE SSTThen

(SSR) squares of sum residual theis ˆ

(SSE) squares of sum explained theis ˆ

(SST) squares of sum total theis

:following thedefine then Weˆˆ

part, dunexplainean and part, explainedan of up

made being asn observatioeach ofcan think We

2

2

2

i

i

i

iii

u

yy

yy

uyy


Proof that SST = SSE + SSR

0 ˆˆ that know weand

SSE ˆˆ2 SSR

ˆˆˆ2ˆ

ˆˆ

ˆˆ

22

2

22

yyu

yyu

yyyyuu

yyu

yyyyyy

ii

ii

iiii

ii

iiii


Goodness-of-Fit

• How do we think about how well our sample regression line fits our sample data?

• Can compute the fraction of the total sum of squares (SST) that is explained by the model, call this the R-squared of regression

• R2 = SSE/SST = 1 – SSR/SST

35

Unbiasedness of OLS

• Assume the population model is linear in parameters as y = 0 + 1x + u

• Assume we can use a random sample of size n, {(xi, yi): i=1, 2, …, n}, from the population model. Thus we can write the sample model yi = 0 + 1xi + ui

• Assume E(u|x) = 0 and thus E(ui|xi) = 0

• Much of the problem in econometric estimation that we are going to face is because of violation of this assumption

• Assume there is variation in xi


Unbiasedness of OLS (cont)

• In order to think about unbiasedness, we need to rewrite our estimator in terms of the population parameter

• Start with a simple rewrite of the formula as

22

21 where,ˆ

xxs

s

yxx

ix

x

ii



ii

iii

ii

iii

iiiii

uxx

xxxxx

uxx

xxxxx

uxxxyxx

10

10

10



211

2

1

2

ˆ

thusand ,

asrewritten becan numerator the,so

,0

x

ii

iix

iii

i

s

uxx

uxxs

xxxxx

xx

39

Unbiasedness Summary

• The OLS estimates of 1 and 0 are unbiased

• Proof of unbiasedness depends on assumptions – if any assumption fails, then OLS is not necessarily unbiased

• Remember unbiasedness is a description of the estimator – in a given sample we may be “near” or “far” from the true parameter

40

Variance of the OLS Estimators

• Now we know that the sampling distribution of our estimate is centered around the true parameter

• Want to think about how spread out this distribution is

• Much easier to think about this variance under an additional assumption, so

• Assume Var(u|x) = s2 (Homoskedasticity)

41

Variance of OLS (cont)

• Var(u|x) = E(u2|x)-[E(u|x)]2

• E(u|x) = 0, so s2 = E(u2|x) = E(u2) = Var(u)

• Thus s2 is also the unconditional variance, called the error variance

• s, the square root of the error variance is called the standard deviation of the error

• Can say: E(y|x)=0 + 1x and Var(y|x) = s2


..

x1 x2

Homoskedastic Case

E(y|x) = 0 + 1x

y

f(y|x)


.

xx1 x2

f(y|x)

Heteroskedastic Case

x3

..

E(y|x) = 0 + 1x

44

Variance of OLS (cont)

12

22

2

22

2

2

2222

2

2

2

2

2

2

2

211

ˆ1

11

11

1ˆ

ss

ss

Vars

ss

ds

ds

uVards

udVars

uds

VarVar

xx

x

ix

ix

iix

iix

iix

45

Variance of OLS Summary

• The larger the error variance, s2, the larger the variance of the slope estimate

• The larger the variability in the xi, the smaller the variance of the slope estimate

• As a result, a larger sample size should decrease the variance of the slope estimate

• Problem that the error variance is unknown

46

Estimating the Error Variance

• We don’t know what the error variance, s2, is, because we don’t observe the errors, ui

• What we observe are the residuals, ûi

• We can use the residuals to form an estimate of the error variance

47

Error Variance Estimate (cont)

2/ˆ

2

1ˆ

is ofestimator unbiasedan Then,

ˆˆ

ˆˆ

ˆˆˆ

22

2

1100

1010

10

nSSRun

u

xux

xyu

i

i

iii

iii

s

s

48

Error Variance Estimate (cont)

21

2

1

1

2

/ˆˆse

, ˆ oferror standard the

have then wefor ˆ substitute weif

ˆsd that recall

regression theoferror Standardˆˆ

xx

s

i

x

s

ss

s

ss