ordinary least squares method ˆ ( ˆ ,...
TRANSCRIPT
Basic Econometrics in Transportation
Bivariate Regression Analysis
Basic Econometrics in Transportation
Amir Amir SamimiSamimi
Civil Engineering DepartmentSharif University of Technology
Primary Source: Basic Econometrics (Gujarati)
1/60
Problem of Estimation
Ordinary Least Squares (OLS) Ordinary Least Squares (OLS) Maximum Likelihood (ML)
Generally, OLS is used extensively in regression analysis: It is intuitively appealing Mathematically much simpler than the MLE. y p Both methods generally give similar results in the linear
regression context.
2/60
Ordinary Least Squares Method
To determine PRF: iii uXY 21
o dete e : Population Regression Function
We estimate it from the SRF: Sample Regression Function
We would like to determine the SRF in a way that it is as close as possible to the actual Y
iii 21
iii uXY ˆˆˆ21
as possible to the actual Y. i.e. sum of the residuals is as small as possible.
Why square? More weight to large residuals. Sign of the residuals.
iu
3/60
Ordinary Least Squares Method
)ˆ,ˆ(ˆ 212 fui
OLS finds unique estimates of β1 and β2 that give the smallest possible value of the function above.
xyx
XXYYXX
i
ii
i
ii222 )(
))((
Deviation form:
XY 21ˆˆ
XXxYYy
ii
ii
4/60
OLS Properties: Sample Mean
Regression line passes through the sample mean
Regression line passes through the sample mean
5/60
OLS Properties: Linearity
Linearity
Linearity
It is a linear function (a weighted average) of Y. X and thus k are nonstochastic and: Xi and thus ki are nonstochastic and:
Note:
6/60
OLS Properties: Unbiasedness
Unbiasedness
U b ased ess
7/60
OLS Properties: Mean of Estimated Y
Mean value of the estimated Y is equal to the mean value of the
ea va ue o t e est ated s equa to t e ea va ue o t eactual Y:
Sum both sides over the sample values and divide by sample size
8/60
OLS Properties: Mean of Residuals
The mean value of the residuals is zero:
The mean value of the residuals is zero:
9/60
OLS Properties: Uncorrelated Residuals Y
The residuals are uncorrelated with the predicted Yi:
The residuals are uncorrelated with the predicted Yi:
10/60
OLS Properties: Uncorrelated Residuals X
The residuals are uncorrelated with Xi:
e es dua s a e u co e ated w t i:
11/60
OLS Assumptions
Our objective is not only to estimate some coefficients but also
Ou object ve s ot o y to est ate so e coe c e ts but a soto draw inferences about the true coefficients.
Thus certain assumptions are made about the manner in which Yi are generated.
Yi = β1 + β2Xi + ui
Unless we are specific about how Xi and ui are created, there is i ino way we can make any statistical inference about the Yi, β1and β2.
The Gaussian standard, or classical linear regression model (CLRM), makes 10 assumptions that are extremely critical to the valid interpretation of the regression estimates.
12/60
Assumption 1
The regression model is linear in the parameters.
The regression model is linear in the parameters.
E(Y | Xi) = β1 + β2Xi2 is a linear model.
E(Y | Xi) = β1 + β22Xi is not a linear model.
13/60
Assumption 2
X values are fixed in repeated sampling.
X values are fixed in repeated sampling.
X is assumed to be non-stochastic. Our regression analysis is conditional regression analysis, that
is, conditional on the given values of the regressor(s) X.
14/60
Assumption 3
Zero mean value of disturbance ui
e o ea va ue o d stu ba ce ui
E(ui | Xi) = 0
Factors not explicitly included in the model, and therefore subsumed in ui, do not systematically affect the mean value of Y.
15/60
Assumption 4
Homoscedasticity or equal variance of ui
o oscedast c ty o equa va a ce o ui
16/60
Assumption 4
Heteroscedasticity
Heteroscedasticity
17/60
Assumption 5
No autocorrelation (serial correlation) between the disturbances.
No autocorrelation (serial correlation) between the disturbances.
18/60
Assumption 6
Disturbance u and explanatory variable X are uncorrelated.
stu ba ce u a d e p a ato y va ab e a e u co e ated.
If X and u are correlated, their individual effects on Y may not be assessed.
19/60
Assumption 7
n must be greater than the number of explanatory variables.
n ust be g eate t a t e u be o e p a ato y va ab es.
Obviously, we need at least two pairs of observations to estimate the two unknowns!
20/60
Assumption 8
Variability in X values.
Variability in X values.
Mathematically, if all the X values are identical, it impossible to estimate β2 (the denominator will be zero) and therefore β1.
Intuitively, it is obvious as well.Intuitively, it is obvious as well.
21/60
Assumption 9
The regression model is correctly specified.
The regression model is correctly specified.
Important questions that arise in the specification of a model: What variables should be included in the model? What is the functional form of the model? Is it linear in the parameters, the
variables, or both? What are the probabilistic assumptions made about the Y the X and the u What are the probabilistic assumptions made about the Yi, the Xi, and the ui
entering the model?
22/60
Assumption 10
There is no perfect multicollinearity.
e e s o pe ect u t co ea ty.
No perfect linear relationships among the explanatory variables.
Will be further discussed in multiple regression models.
23/60
Assumptions
How realistic are all these assumptions?
ow ea st c a e a t ese assu pt o s? We make certain assumptions because they facilitate the study,
not because they are realistic. Consequences of violation of CLRM assumptions will be
examined later.
We will look into: Precision of OLS estimates, and Statistical properties of OLS.
24/60
Precision of OLS Estimates
Precision of an estimate is measured by its standard error.
Precision of an estimate is measured by its standard error.
25/60
Standard Error of
Precision of an estimate is measured by its standard error.
Precision of an estimate is measured by its standard error.
26/60
Homoscedastic Variance of ui
How to estimate 2
How to estimate
Note:
27/60
Features of the Variances
The variance of is directly proportional to but inversely
e va a ce o s d ect y p opo t o a to but ve se yproportional to sum of .
As n increases, the precision with which β2can be estimated also increases. If there is substantial variation in X, β2 can be measured more accurately.
The variance of is directly proportional to σ2 and Xi2, but
i l i l 2 d h l iinversely proportional to xi2 and the sample size n.
.
If the slope coefficient is overestimated, the intercept will be underestimated.
28/60
Gauss–Markov Theorem
Given the assumptions of the classical linear regression model,
Given the assumptions of the classical linear regression model, the OLS estimators, in the class of unbiased linear estimators, have minimum variance, that is, they are BLUE.
The theorem makes no assumptions about the probability distribution of ui, and therefore of Yi.
29/60
Goodness of Fit
The coefficient of determination R2 is a summary measure that
The coefficient of determination R is a summary measure that tells how well the sample regression line fits the data.
Total Sum of Squares (TSS) = Explained Sum of Squares (ESS) + Residual Sum of Squares (RSS)
30/60
Consistency
An asymptotic property.
asy ptot c p ope ty. An estimator is consistent if it is unbiased and its variance tends
to zero as the sample size n tends to infinity. Unbiasedness is already proved.
31/60
Before Hypothesis Testing
Using the method of OLS we can estimate β1, β2, and σ2.
Us g t e et od o O S we ca est ate β1, β2, and σ . Estimators ( ) are random variables. To draw inferences about PRF, we must find out how close
is to the true .
We need to find out PDF of the estimators.
32/60
Probability Distribution of Disturbances
is ultimately a linear function of the random variable ui,
is ultimately a linear function of the random variable ui, which is random by assumption.
The nature of the probability distribution of ui plays anThe nature of the probability distribution of ui plays an extremely important role in hypothesis testing.
It is usually assumed that ui ∼ NID(0, σ2) NID: Normally and Independently Distributed.
33/60
Why the Normality Assumption?
We hope that the influence of these omitted or neglected
We hope that the influence of these omitted or neglected variables is small and at best random.
We show by the CLT, that if there are a large number of IID random variables, the distribution of their sum tends to a normal distribution as the number of such variables increase indefinitely. Central limit theorem (CLT):
L X X X d i d d d i bl ll f hi h h Let X1, X2, . . . , Xn denote n independent random variables, all of which have the same PDF with mean = μ and variance = σ2. Let :
34/60
Why the Normality Assumption?
A variant of the CLT states that, even if the number of variables
va a t o t e C states t at, eve t e u be o va ab esis not very large or if these variables are not strictly independent, their sum may still be normally distributed.
With this assumption, PDF of OLS estimators can be easily derived, as any linear function of normally distributed variables is itself normally distributed.
The normal distribution is a comparatively simple distribution involving only two parameters.
It enables us to use the t, F, and χ2 tests for regression models. The normality assumption plays a critical role for small sample size data. In reasonably large sample size, we may relax the normality assumption.
35/60
Normality Test
Since we are “imposing” the normality assumption, it behooves
S ce we a e pos g t e o a ty assu pt o , t be oovesus to find out in practical applications involving small sample size data whether the normality assumption is appropriate.
Later, we will introduces some tests to do just that. We will come across situations where the normality assumption
may be inappropriate. Until then we will continue with the normality assumption.
36/60
Estimators’ Properties with Normality Assumption
They are unbiased, with minimum variance (efficient), and
They are unbiased, with minimum variance (efficient), and consistent.
, , and is distributed as the χ2 with (n − 2) df.
Will help us to draw inferences about the true σ2 from the estimated σ2.
. The importance of this will be explained later.
Estimators have minimum variance in the entire class of unbiased estimators, whether linear or not.
Best Unbiased Estimators (BUE)
37/60
Method of Maximum Likelihood
If ui are assumed to be normally distributed, the ML and OLS
If ui are assumed to be normally distributed, the ML and OLS estimators of the regression coefficients, are identical.
The ML estimator of σ2 is biased for small sample size. Asymptotically, the ML estimator of σ2 is unbiased. ML method can be applied to regression models that are
nonlinear in the parameters, in which OLS is generally not used.
38/60
ML Estimation
Assume the two-variable model Yi = β1 + β2Xi + ui in which
ssu e t e two va ab e ode i β1 β2 i ui w c
Having independent Y’s, joint PDF of Y1,...,Yn, can be written as:
Where:
β1, β2, and σ2 are unknowns in likelihood function:
39/60
ML Estimation
The method of maximum likelihood consists in estimating the
e et od o a u e ood co s sts est at g t eunknown parameters in such a manner that the probability of observing the given Y’s is as high as possible.
40/60
ML Estimation
From the first-order condition for optimization:
From the first order condition for optimization:
Note how ML underestimates the true σ2 in small samples.
41/60
Interval Estimation
How reliable are the point estimates?
How reliable are the point estimates? We try to find out two positive numbers δ and α, such that:
Probability of constructing an interval that contains β2 is 1 − α. Such an interval is known as a confidence interval. α (0 < α < 1) is known as the level of significance.
H h fid i l d? How are the confidence intervals constructed? If the probability distributions of the estimators are known, the task of
constructing confidence intervals is a simple one.
42/60
Confidence Intervals for
It can be shown that the t variable follows the t distribution with
t ca be s ow t at t e t va ab e o ows t e t d st but o w tn − 2 df.
Width of the confidence interval is proportional to the standard error of the estimator.
Same for 1
43/60
Confidence Intervals for
It can be shown that under the normality assumption, following
t ca be s ow t at u de t e o a ty assu pt o , o ow gvariable follows χ2 distribution with n − 2 df.
Interpretation of this interval: If we establish 95% confidence limits on σ2 and if we maintain a priori that these limits
will include true σ2,we shall be right in the long run 95 percent of the time.
44/60
Hypothesis Testing
Is a given observation compatible with some stated hypothesis?
Is a given observation compatible with some stated hypothesis? In statistics, the stated hypothesis is known as the null
hypothesis or H0 (versus an alternative hypothesis or H1). Hypothesis testing is developing rules for rejecting or accepting
the null hypothesis. Confidence interval approach, Test of significance approach.
Most of the statistical hypotheses of our interest make statements about one or more values of the parameters of some assumed probability distribution such as the normal, F, t, or χ2.
45/60
Confidence Interval Approach
Decision Rule: Construct a 100(1 − α)% confidence interval for
Decision Rule: Construct a 100(1 α)% confidence interval for β2. If the β2 under H0 falls within this interval, do not reject H0, but if it falls outside this interval, reject H0.
Note: There is a 100α percent chance of committing a Type I error. If α = 0.05, there is a 5 percent chance that we could reject the null hypothesis even though it is true.
When we reject the null hypothesis, we say that our finding is statistically significant.
One-tail or two-tail test: Sometimes we have a strong expectation that the alternative hypothesis is one-
sided rather than two-sided.
46/60
Test of Significance Approach
In the confidenceconfidence--interval interval procedure we try to establish a range
t e confidenceconfidence inte valinte val p ocedu e we t y to estab s a a gethat has a probability of including the true but unknown β2.
In the testtest--ofof--significancesignificance approach we hypothesize some value for β2 and try to see whether the estimated β2 lies within confidence limits around the hypothesized value.
A large t value will be evidence against the null hypothesis.
47/60
Practical Aspects
Accepting “null” hypothesis:
ccept g u ypot es s: All we can say: based on the sample evidence we have no reason to reject it. Another null hypothesis may be equally compatible with the data.
2-t rule of thumb If df >20 and α = 0.05, then the null hypothesis β2 = 0 can be rejected if |t| > 2. In these cases we do not even have to refer to the t table to assess the
significance of the estimated slope coefficient.
Forming the null hypotheses Theoretical expectations or prior empirical work can be relied upon to
formulate hypotheses.
p value The lowest significance level at which a null hypothesis can be rejected.
48/60
Analysis of Variance
A study of two components of TSS (= ESS + RSS) is known as
A study of two components of TSS ( ESS RSS) is known as ANOVA from the regression viewpoint.
49/60
Analysis of Variance
If we assume that the disturbances ui are normally distributed,
If we assume that the disturbances ui are normally distributed,which we do under the CNLRM, and if the null hypothesis (H0)is that β2 = 0, then it can be shown that the F follows the F distribution with 1 df in the numerator and (n − 2) df in the denominator.
What use can be made of the preceding F ratio?
50/60
F-ratio
It can be shown that
t ca be s ow t at
Note that β and σ2 are the true parameters Note that β2 and σ are the true parameters. If β2 is zero, both equations provide us with identical estimates
of true σ2. Thus, X has no linear influence on Y.
F ratio provides a test of the null hypothesis H0: β2 = 0.
51/60
Example
Compute F ratio and obtain p value of the computed F statistic.
Co pute at o a d obta p value o t e co puted stat st c.
p value of this F statistic with 1 and 8 df is 0.0000001. Therefore, if we reject the null hypothesis, the probability of
committing a Type I error is very small.
52/60
Application Of Regression Analysis
One use is to “predict” or “forecast” the future consumption
One use is to predict or forecast the future consumption expenditure Y corresponding to some given level of income X.
Now there are two kinds of predictions: Prediction of the conditional mean value of Y corresponding to a
chosen X. Prediction of an individual Y value corresponding to a chosen X.
53/60
Mean Prediction
Estimator of E(Y | X0):
Estimator of E(Y | X0):
It can be shown that:
This statistic follows the t distribution with n− 2 df and may be used to derive confidence intervalsused to derive confidence intervals
54/60
Individual Prediction
Estimator of E(Y | X0):
st ato o ( | 0):
It can be shown that:
This statistic follows the t distribution with n− 2 df and may be used to derive confidence intervalsused to derive confidence intervals
55/60
Confidence Bands
56/60
Individual Versus Mean Prediction
Confidence interval for individual Y0 is wider than that for the
Confidence interval for individual Y0 is wider than that for the mean value of Y0.
The width of confidence bands is smallest when X0 = X.
57/60
Reporting the Results
58/60
Evaluating the Results
How “good” is the fitted model? Any standard?
ow good s t e tted ode ? y sta da d? Are the signs of estimated coefficients in accordance with
theoretical or prior expectations? How well does the model explain variation in Y? One can use r2
Does the model satisfies the assumptions of CNLRM? For now, we would like to check the normality of the disturbance term. Recall
that the t and F tests require that the error term follow the normal distribution.that the t and F tests require that the error term follow the normal distribution.
59/60
Normality Tests
Several tests in the literature. We look at:
Histogram of residuals: A simple graphic device to learn about the shape of the PDF. Horizontal axis: the values of OLS residuals are divided into suitable intervals. Vertical axis: erect rectangles equal in height to the frequency in that interval. From a normal population we will get a bell-shape PDF.
Normal probability plot (NPP):p y p ( ) A simple graphic device. Horizontal axis: plot values of OLS residuals, Vertical axis: show expected value of variable if it were normally distributed. From a normal population we will get a straight line.
The Jarque–Bera test. An asymptotic test, with chi-squared distribution and 2 df:
60/60
Homework 2
Basic Econometrics (Gujarati, 2003)
Basic Econometrics (Gujarati, 2003)
1. Chapter 3, Problem 21 [10 points]2. Chapter 3, Problem 23 [30 points]
3. Chapter 5, Problem 9 [30 points]4. Chapter 5, Problem 19 [30 points]
Assignment weight factor = 1