ordinary least squares method ˆ ( ˆ ,...

Basic Econometrics in Transportation

Bivariate Regression Analysis

Basic Econometrics in Transportation

Amir Amir SamimiSamimi

Civil Engineering DepartmentSharif University of Technology

Primary Source: Basic Econometrics (Gujarati)

1/60

Problem of Estimation

Ordinary Least Squares (OLS) Ordinary Least Squares (OLS) Maximum Likelihood (ML)

Generally, OLS is used extensively in regression analysis: It is intuitively appealing Mathematically much simpler than the MLE. y p Both methods generally give similar results in the linear

regression context.

2/60

Ordinary Least Squares Method

To determine PRF: iii uXY 21

o dete e : Population Regression Function

We estimate it from the SRF: Sample Regression Function

We would like to determine the SRF in a way that it is as close as possible to the actual Y

iii 21

iii uXY ˆˆˆ21

as possible to the actual Y. i.e. sum of the residuals is as small as possible.

Why square? More weight to large residuals. Sign of the residuals.

iu

3/60

Ordinary Least Squares Method

)ˆ,ˆ(ˆ 212 fui

OLS finds unique estimates of β1 and β2 that give the smallest possible value of the function above.

xyx

XXYYXX

i

ii

i

ii222 )(

))((

Deviation form:

XY 21ˆˆ

XXxYYy

ii

ii

4/60

OLS Properties: Sample Mean

Regression line passes through the sample mean

Regression line passes through the sample mean

5/60

OLS Properties: Linearity

Linearity

Linearity

It is a linear function (a weighted average) of Y. X and thus k are nonstochastic and: Xi and thus ki are nonstochastic and:

Note:

6/60

OLS Properties: Unbiasedness

Unbiasedness

U b ased ess

7/60

OLS Properties: Mean of Estimated Y

Mean value of the estimated Y is equal to the mean value of the

ea va ue o t e est ated s equa to t e ea va ue o t eactual Y:

Sum both sides over the sample values and divide by sample size

8/60

OLS Properties: Mean of Residuals

The mean value of the residuals is zero:

The mean value of the residuals is zero:

9/60

OLS Properties: Uncorrelated Residuals Y

The residuals are uncorrelated with the predicted Yi:

The residuals are uncorrelated with the predicted Yi:

10/60

OLS Properties: Uncorrelated Residuals X

The residuals are uncorrelated with Xi:

e es dua s a e u co e ated w t i:

11/60

OLS Assumptions

Our objective is not only to estimate some coefficients but also

Ou object ve s ot o y to est ate so e coe c e ts but a soto draw inferences about the true coefficients.

Thus certain assumptions are made about the manner in which Yi are generated.

Yi = β1 + β2Xi + ui

Unless we are specific about how Xi and ui are created, there is i ino way we can make any statistical inference about the Yi, β1and β2.

The Gaussian standard, or classical linear regression model (CLRM), makes 10 assumptions that are extremely critical to the valid interpretation of the regression estimates.

12/60

Assumption 1

The regression model is linear in the parameters.

The regression model is linear in the parameters.

E(Y | Xi) = β1 + β2Xi2 is a linear model.

E(Y | Xi) = β1 + β22Xi is not a linear model.

13/60

Assumption 2

X values are fixed in repeated sampling.

X values are fixed in repeated sampling.

X is assumed to be non-stochastic. Our regression analysis is conditional regression analysis, that

is, conditional on the given values of the regressor(s) X.

14/60

Assumption 3

Zero mean value of disturbance ui

e o ea va ue o d stu ba ce ui

E(ui | Xi) = 0

Factors not explicitly included in the model, and therefore subsumed in ui, do not systematically affect the mean value of Y.

15/60

Assumption 4

Homoscedasticity or equal variance of ui

o oscedast c ty o equa va a ce o ui

16/60

Assumption 4

Heteroscedasticity

Heteroscedasticity

17/60

Assumption 5

No autocorrelation (serial correlation) between the disturbances.

No autocorrelation (serial correlation) between the disturbances.

18/60

Assumption 6

Disturbance u and explanatory variable X are uncorrelated.

stu ba ce u a d e p a ato y va ab e a e u co e ated.

If X and u are correlated, their individual effects on Y may not be assessed.

19/60

Assumption 7

n must be greater than the number of explanatory variables.

n ust be g eate t a t e u be o e p a ato y va ab es.

Obviously, we need at least two pairs of observations to estimate the two unknowns!

20/60

Assumption 8

Variability in X values.

Variability in X values.

Mathematically, if all the X values are identical, it impossible to estimate β2 (the denominator will be zero) and therefore β1.

Intuitively, it is obvious as well.Intuitively, it is obvious as well.

21/60

Assumption 9

The regression model is correctly specified.

The regression model is correctly specified.

Important questions that arise in the specification of a model: What variables should be included in the model? What is the functional form of the model? Is it linear in the parameters, the

variables, or both? What are the probabilistic assumptions made about the Y the X and the u What are the probabilistic assumptions made about the Yi, the Xi, and the ui

entering the model?

22/60

Assumption 10

There is no perfect multicollinearity.

e e s o pe ect u t co ea ty.

No perfect linear relationships among the explanatory variables.

Will be further discussed in multiple regression models.

23/60

Assumptions

How realistic are all these assumptions?

ow ea st c a e a t ese assu pt o s? We make certain assumptions because they facilitate the study,

not because they are realistic. Consequences of violation of CLRM assumptions will be

examined later.

We will look into: Precision of OLS estimates, and Statistical properties of OLS.

24/60

Precision of OLS Estimates

Precision of an estimate is measured by its standard error.


25/60

Standard Error of



26/60

Homoscedastic Variance of ui

How to estimate 2

How to estimate

Note:

27/60

Features of the Variances

The variance of is directly proportional to but inversely

e va a ce o s d ect y p opo t o a to but ve se yproportional to sum of .

As n increases, the precision with which β2can be estimated also increases. If there is substantial variation in X, β2 can be measured more accurately.

The variance of is directly proportional to σ2 and Xi2, but

i l i l 2 d h l iinversely proportional to xi2 and the sample size n.

.

If the slope coefficient is overestimated, the intercept will be underestimated.

28/60

Gauss–Markov Theorem

Given the assumptions of the classical linear regression model,

Given the assumptions of the classical linear regression model, the OLS estimators, in the class of unbiased linear estimators, have minimum variance, that is, they are BLUE.

The theorem makes no assumptions about the probability distribution of ui, and therefore of Yi.

29/60

Goodness of Fit

The coefficient of determination R2 is a summary measure that

The coefficient of determination R is a summary measure that tells how well the sample regression line fits the data.

Total Sum of Squares (TSS) = Explained Sum of Squares (ESS) + Residual Sum of Squares (RSS)

30/60

Consistency

An asymptotic property.

asy ptot c p ope ty. An estimator is consistent if it is unbiased and its variance tends

to zero as the sample size n tends to infinity. Unbiasedness is already proved.

31/60

Before Hypothesis Testing

Using the method of OLS we can estimate β1, β2, and σ2.

Us g t e et od o O S we ca est ate β1, β2, and σ . Estimators ( ) are random variables. To draw inferences about PRF, we must find out how close

is to the true .

We need to find out PDF of the estimators.

32/60

Probability Distribution of Disturbances

is ultimately a linear function of the random variable ui,

is ultimately a linear function of the random variable ui, which is random by assumption.

The nature of the probability distribution of ui plays anThe nature of the probability distribution of ui plays an extremely important role in hypothesis testing.

It is usually assumed that ui ∼ NID(0, σ2) NID: Normally and Independently Distributed.

33/60

Why the Normality Assumption?

We hope that the influence of these omitted or neglected

We hope that the influence of these omitted or neglected variables is small and at best random.

We show by the CLT, that if there are a large number of IID random variables, the distribution of their sum tends to a normal distribution as the number of such variables increase indefinitely. Central limit theorem (CLT):

L X X X d i d d d i bl ll f hi h h Let X1, X2, . . . , Xn denote n independent random variables, all of which have the same PDF with mean = μ and variance = σ2. Let :

34/60

Why the Normality Assumption?

A variant of the CLT states that, even if the number of variables

va a t o t e C states t at, eve t e u be o va ab esis not very large or if these variables are not strictly independent, their sum may still be normally distributed.

With this assumption, PDF of OLS estimators can be easily derived, as any linear function of normally distributed variables is itself normally distributed.

The normal distribution is a comparatively simple distribution involving only two parameters.

It enables us to use the t, F, and χ2 tests for regression models. The normality assumption plays a critical role for small sample size data. In reasonably large sample size, we may relax the normality assumption.

35/60

Normality Test

Since we are “imposing” the normality assumption, it behooves

S ce we a e pos g t e o a ty assu pt o , t be oovesus to find out in practical applications involving small sample size data whether the normality assumption is appropriate.

Later, we will introduces some tests to do just that. We will come across situations where the normality assumption

may be inappropriate. Until then we will continue with the normality assumption.

36/60

Estimators’ Properties with Normality Assumption

They are unbiased, with minimum variance (efficient), and

They are unbiased, with minimum variance (efficient), and consistent.

, , and is distributed as the χ2 with (n − 2) df.

Will help us to draw inferences about the true σ2 from the estimated σ2.

. The importance of this will be explained later.

Estimators have minimum variance in the entire class of unbiased estimators, whether linear or not.

Best Unbiased Estimators (BUE)

37/60

Method of Maximum Likelihood

If ui are assumed to be normally distributed, the ML and OLS

If ui are assumed to be normally distributed, the ML and OLS estimators of the regression coefficients, are identical.

The ML estimator of σ2 is biased for small sample size. Asymptotically, the ML estimator of σ2 is unbiased. ML method can be applied to regression models that are

nonlinear in the parameters, in which OLS is generally not used.

38/60

ML Estimation

Assume the two-variable model Yi = β1 + β2Xi + ui in which

ssu e t e two va ab e ode i β1 β2 i ui w c

Having independent Y’s, joint PDF of Y1,...,Yn, can be written as:

Where:

β1, β2, and σ2 are unknowns in likelihood function:

39/60

ML Estimation

The method of maximum likelihood consists in estimating the

e et od o a u e ood co s sts est at g t eunknown parameters in such a manner that the probability of observing the given Y’s is as high as possible.

40/60

ML Estimation

From the first-order condition for optimization:

From the first order condition for optimization:

Note how ML underestimates the true σ2 in small samples.

41/60

Interval Estimation

How reliable are the point estimates?

How reliable are the point estimates? We try to find out two positive numbers δ and α, such that:

Probability of constructing an interval that contains β2 is 1 − α. Such an interval is known as a confidence interval. α (0 < α < 1) is known as the level of significance.

H h fid i l d? How are the confidence intervals constructed? If the probability distributions of the estimators are known, the task of

constructing confidence intervals is a simple one.

42/60

Confidence Intervals for

It can be shown that the t variable follows the t distribution with

t ca be s ow t at t e t va ab e o ows t e t d st but o w tn − 2 df.

Width of the confidence interval is proportional to the standard error of the estimator.

Same for 1

43/60

Confidence Intervals for

It can be shown that under the normality assumption, following

t ca be s ow t at u de t e o a ty assu pt o , o ow gvariable follows χ2 distribution with n − 2 df.

Interpretation of this interval: If we establish 95% confidence limits on σ2 and if we maintain a priori that these limits

will include true σ2,we shall be right in the long run 95 percent of the time.

44/60

Hypothesis Testing

Is a given observation compatible with some stated hypothesis?

Is a given observation compatible with some stated hypothesis? In statistics, the stated hypothesis is known as the null

hypothesis or H0 (versus an alternative hypothesis or H1). Hypothesis testing is developing rules for rejecting or accepting

the null hypothesis. Confidence interval approach, Test of significance approach.

Most of the statistical hypotheses of our interest make statements about one or more values of the parameters of some assumed probability distribution such as the normal, F, t, or χ2.

45/60

Confidence Interval Approach

Decision Rule: Construct a 100(1 − α)% confidence interval for

Decision Rule: Construct a 100(1 α)% confidence interval for β2. If the β2 under H0 falls within this interval, do not reject H0, but if it falls outside this interval, reject H0.

Note: There is a 100α percent chance of committing a Type I error. If α = 0.05, there is a 5 percent chance that we could reject the null hypothesis even though it is true.

When we reject the null hypothesis, we say that our finding is statistically significant.

One-tail or two-tail test: Sometimes we have a strong expectation that the alternative hypothesis is one-

sided rather than two-sided.

46/60

Test of Significance Approach

In the confidenceconfidence--interval interval procedure we try to establish a range

t e confidenceconfidence inte valinte val p ocedu e we t y to estab s a a gethat has a probability of including the true but unknown β2.

In the testtest--ofof--significancesignificance approach we hypothesize some value for β2 and try to see whether the estimated β2 lies within confidence limits around the hypothesized value.

A large t value will be evidence against the null hypothesis.

47/60

Practical Aspects

Accepting “null” hypothesis:

ccept g u ypot es s: All we can say: based on the sample evidence we have no reason to reject it. Another null hypothesis may be equally compatible with the data.

2-t rule of thumb If df >20 and α = 0.05, then the null hypothesis β2 = 0 can be rejected if |t| > 2. In these cases we do not even have to refer to the t table to assess the

significance of the estimated slope coefficient.

Forming the null hypotheses Theoretical expectations or prior empirical work can be relied upon to

formulate hypotheses.

p value The lowest significance level at which a null hypothesis can be rejected.

48/60

Analysis of Variance

A study of two components of TSS (= ESS + RSS) is known as

A study of two components of TSS ( ESS RSS) is known as ANOVA from the regression viewpoint.

49/60

Analysis of Variance

If we assume that the disturbances ui are normally distributed,

If we assume that the disturbances ui are normally distributed,which we do under the CNLRM, and if the null hypothesis (H0)is that β2 = 0, then it can be shown that the F follows the F distribution with 1 df in the numerator and (n − 2) df in the denominator.

What use can be made of the preceding F ratio?

50/60

F-ratio

It can be shown that

t ca be s ow t at

Note that β and σ2 are the true parameters Note that β2 and σ are the true parameters. If β2 is zero, both equations provide us with identical estimates

of true σ2. Thus, X has no linear influence on Y.

F ratio provides a test of the null hypothesis H0: β2 = 0.

51/60

Example

Compute F ratio and obtain p value of the computed F statistic.

Co pute at o a d obta p value o t e co puted stat st c.

p value of this F statistic with 1 and 8 df is 0.0000001. Therefore, if we reject the null hypothesis, the probability of

committing a Type I error is very small.

52/60

Application Of Regression Analysis

One use is to “predict” or “forecast” the future consumption

One use is to predict or forecast the future consumption expenditure Y corresponding to some given level of income X.

Now there are two kinds of predictions: Prediction of the conditional mean value of Y corresponding to a

chosen X. Prediction of an individual Y value corresponding to a chosen X.

53/60

Mean Prediction

Estimator of E(Y | X0):


It can be shown that:

This statistic follows the t distribution with n− 2 df and may be used to derive confidence intervalsused to derive confidence intervals

54/60

Individual Prediction


st ato o ( | 0):

It can be shown that:

This statistic follows the t distribution with n− 2 df and may be used to derive confidence intervalsused to derive confidence intervals

55/60

Confidence Bands

56/60

Individual Versus Mean Prediction

Confidence interval for individual Y0 is wider than that for the

Confidence interval for individual Y0 is wider than that for the mean value of Y0.

The width of confidence bands is smallest when X0 = X.

57/60

Reporting the Results

58/60

Evaluating the Results

How “good” is the fitted model? Any standard?

ow good s t e tted ode ? y sta da d? Are the signs of estimated coefficients in accordance with

theoretical or prior expectations? How well does the model explain variation in Y? One can use r2

Does the model satisfies the assumptions of CNLRM? For now, we would like to check the normality of the disturbance term. Recall

that the t and F tests require that the error term follow the normal distribution.that the t and F tests require that the error term follow the normal distribution.

59/60

Normality Tests

Several tests in the literature. We look at:

Histogram of residuals: A simple graphic device to learn about the shape of the PDF. Horizontal axis: the values of OLS residuals are divided into suitable intervals. Vertical axis: erect rectangles equal in height to the frequency in that interval. From a normal population we will get a bell-shape PDF.

Normal probability plot (NPP):p y p ( ) A simple graphic device. Horizontal axis: plot values of OLS residuals, Vertical axis: show expected value of variable if it were normally distributed. From a normal population we will get a straight line.

The Jarque–Bera test. An asymptotic test, with chi-squared distribution and 2 df:

60/60

Homework 2

Basic Econometrics (Gujarati, 2003)

Basic Econometrics (Gujarati, 2003)

1. Chapter 3, Problem 21 [10 points]2. Chapter 3, Problem 23 [30 points]

3. Chapter 5, Problem 9 [30 points]4. Chapter 5, Problem 19 [30 points]

Assignment weight factor = 1

ordinary least squares method ˆ ( ˆ ,...

Documents