Download - Notes on Econometrics
I 1
James B. McDonald
Brigham Young University 5/2010
I. Introduction to Econometrics
Objective: Make this one of the most interesting and useful courses you take in your
undergraduate program.
Outline: A. Models and Basic Concepts, B. Data, C. Econometric Projects, D.
Problem set
Econometrics deals with the problem of estimating relationships between variables. These
techniques are widely used in the public and private sectors as well as in academic settings. They
help provide an understanding about relationships between variables which can also be useful in
policy analysis and in quantifying expectations about future events.
Some applications of econometric procedures include:
• Economics and Business o Estimation of demand relationships
impact of advertising on demand pricing decisions determinants of market share estimation of income elasticities
o Estimation of cost relationships o International trade and the balance of payments o Macro models o Rational expectations o Predicting corporate bankruptcy or individual default on loans o Identifying takeover targets
• Education o Production functions
Tradeoffs between different education techniques o Estimation of supply and demand for teachers o Predicting acceptance into graduate and professional programs o Estimating the impact of different types of schools on graduate’s salaries
• Political Science o Analysis of voting behavior
• Public Sector o Forecasting tax receipts o Public Sector production functions
• Legal Profession o Models of jury selection o Discrimination
2I
In each application there is the question of (1) MODEL FORMULATION (functional form,
variable classification as well as the theoretical foundation), (2) ESTIMATION of unknown
parameters, (3) TESTING hypotheses, and (4) PREDICTION.
A. Models and Basic Concepts
1. The formulation of the model is generally based upon economic considerations.
Example 1. Consumer Demand Theory
Maximize U(X1, X2)
Subject to P1X1 + P2X2 = Y
where Y denotes income and the Pi and Xi, respectively, denote the price and quantity of the
ith good.
The solution of this problem yields demand equations for X1 and X2
Xi = Di(P1, P2, Y) i = 1, 2
where the functional form is unknown unless the utility function U( ) is specified. If
advertising (A) effects preferences (U(X1, X2, A)), then demand will also depend upon
advertising expenditure, Xi = Di (P1, P2, Y, A). Statistical data for Xi, Pi and Y and
econometric procedures are then used to estimate the demand equations and any unknown
parameters.
Example 2. A Simple Macro Model
Ct = β1 + β2 (Yt - Tt)
Yt = Ct + It + Gt + Xt
where Ct, Yt, It, Gt, Tt, and Xt respectively denote consumption, total production, investment,
government expenditure, taxes and net exports. β1 and β2 are unknown parameters.
It is important to remember that models are not complete descriptions of a situation, but
rather attempt to summarize the main relationships between the variables.
3I
a. Classification of Variables
(1) Endogenous variables (dependent)--those variables determined by the model, e.g.,
X1 and X2 in example 1 and Yt and Ct in example 2.
(2) Exogenous variables (independent)--those variables not determined by the model,
but which are assumed to be given. P1, P2 and Y would be exogenous in example
1. It, Gt, Tt and Xt would be the exogenous variables in model 2.
(3) Predetermined variables--
(a) lagged endogenous variables--endogenous variables from a previous time
period;
(b) exogenous variables as defined above.
b. Representation of Models
(1) Structural representation--a mathematical representation of a hypothesized
model (based on economic theory) which determines the value of endogenous
variables collectively explained by the model. The structural equations may
include more than one endogenous or dependent variable per equation.
Examples:
(a) A simple macro model
Ct = β1 + β2 (Yt - Tt) + εt
Yt = Ct + It + Gt + Xt
Dependent variables: C,Y
Independent variables: T, I, G, X
Unknown parameters: β1, β2
(b) Demand: Qt = β1 + β2Pt + γ1Yt + ε1t
Supply: Qt = β3 + β4Pt + γ2wt + ε2t
Dependent variables: Q, P
4I
Independent variables: Y, W
Unknown parameters: β1, β2, β3, β4, γ1, γ2
The ε's in these equations represent the "errors" not explained in the model. The
errors can represent the impact of other explanatory factors or measurement errors.
In each case we will want to use data to estimate the unknown parameters.
(2) Reduced form representation--expresses the current level of each of the
endogenous variables as a function of predetermined variables (exogenous and/or
lagged dependent).
Examples: The reduced form representation corresponding to the two previous
structural models can be shown to be as follows:
(a) β
ε
ββ
β
2
ttt2tt
22
1t
-1 + )X + TB - G + I(
-1
1 +
-1 = Y
β
εβ
β
β
β
β
2
ttt2tt
2
2
2
1t
-1 + )X + T - G + I(
-1 +
-1 = C
(b) ββ
εε
ββ
γ
ββ
γ
ββ
ββ
42
t2t2t
42
1t
42
2
42
13t
-
- + Y
- + w
- +
-
- = P
ββ
εβεβ
ββ
γβ
ββ
γβ
ββ
ββ
42
t24t12t
42
14t
42
22
42
42t
-
- + Y
- - w
- +
-
- = Q
Economics 388 will introduce the analysis of structural economic models, but will primarily
focus on models written in the reduced form representation, i.e., with the dependent variable
on the left and predetermined variables on the right hand side. However, there are some
very important problems with endogenous variables (endogenous regressors) on the right
hand side of the equation.
5I
2. Estimation of Unknown Parameters
The coefficients of the variables in the reduced form and structural representations are
referred to as parameters and are generally unknown. The notation β will be used to denote
the estimator of the unknown population parameter β. In order to obtain any quantitative (as
opposed to qualitative) estimates of the impact of changes in exogenous variables upon the
dependent variables, the unknown parameters must be estimated. As an example of this we
note that based upon the macro model just considered
. - 1
1 =
G
Y
2t
t
β∂
∂
Recalling thatY
C =
t
t
2∂
∂β (marginal propensity to consume) is generally assumed to be
between zero and one, we can deduce that in this model an increase in government
expenditure will result in an increase in the equilibrium level of income. However, in order
to estimate the magnitude of the increase in Yt associated with the increase in Gt, β2 must be
estimated. Sometimes it may be easier to estimate the reduced form coefficient
β 2-1
1directly.
3. Tests of Hypotheses
Many times we are faced with the problem of determining whether a particular variable
is an important explanatory factor: does wealth or advertising have a significant impact on
consumption; what is the direction of influence of a change in a variable; or how can we test
hypotheses about the magnitude of an elasticity under consideration. All of these problems
involve hypothesis testing and require a knowledge of the density of the estimator under
consideration or of a related test statistic.
6I
For example, assume that the density of β 2, f( β 2), under the null hypothesis Ho: β2 = 0
appears as follows:
Assume that 2β denotes the estimated value of β2. If β 2 is far out in the tail, which is
unlikely under the null hypotheses, we will agree to reject the null hypothesis that β2 = 0.
4. Prediction
A frequent application of econometrics is to obtain predictions for the dependent
variables corresponding to a certain value for the independent variable(s) [X]. In order to
obtain a prediction for the dependent variable (Y) in some future period, we need to obtain a
prediction for the independent variables (X) (say X*) in that period and also assume that the
relationship between X and Y observed in the sample period continues to be valid in the
future. Substituting in the predicted value of X (X*) into the estimated relationship yields
the estimated value of Y (Y*= β 1 + β 2X*). We know that Y* will probably not be exactly
correct and so we will also discuss methods of obtaining confidence intervals for the actual
value of Y.
β2 = 0
f( β 2)
7I
The first exercise set attempts to clarify the notion of reduced form and structural
representations of economic models. The importance of the structural parameters is also
illustrated in these exercises. We now turn to some important issues related to the data used
in estimating economic models.
B. Data
Applied econometrics involves the four steps just discussed: (1) model formulation and
interpretation of variables, (2) estimation of unknown parameters, (3) hypothesis testing, and
(4) prediction. The process summarized in these four steps is an integral part of empirical
research in the physical and social sciences. However, the results of this research may be
sensitive to the formulation of the model AND the data used. Frequently the desired data are
not available or are not in the desired form. Some data types and issues involve:
quantity and price indices: Paasche, Laspeyers
real or nominal values
total or per capita levels
stocks vs. flows
deseasonalized vs. seasonalized
An important question is whether the data we are using measure what we really want [story:
museum]. A useful reference to the importance of data and data limitations is O.
Morgenstern, On the Accuracy of Economic Observations.
Estimated relationship between x and y
Y*= β 1 + β 2X*
confidence intervals
8I
1. Data Characteristics:
a. Quantitative--Qualitative
Quantitative variables measure "quantities" such as
price, sales volume, weight or income.
Qualitative variables are used to model "either/or" situations and might be used to
model membership in one of several groups such as:
⋅homeowner or non-homeowner
⋅employed/unemployed
⋅male/female
⋅accurate or inaccurate income tax returns
Dependent and independent variables can be quantitative or qualitative variables.
Example: Consider a possible relationship between salary, years of employment
and gender. This model might be formulated as:
Salary = β1 + β2 years employed + β3 Gender
where we will discuss ways in which “Gender” can be included in the econometric
model in another section dealing with binary or qualitative variables.
b. Time Series, Cross Sectional, Pooled Data
Time Series Data--measures a particular variable over successive time periods (annual,
quarterly, monthly, weekly; e.g., income, consumer price index (CPI)).
Cross Sectional Data--measures a particular variable at a given point in time for
different entities. An example of cross sectional data would be the wholesale price of
unleaded gas at 2:30 p.m. on January 2, 2009 across different gas stations.
9I
Pooled or Merged Cross Sectional/Time Series Data
Per Capita Income, by State and Year
States Year
1980 1985 1990 1995 2000 2005
Alabama T
his
co
lum
n a
lon
e
wo
uld
be
cro
ss-
sect
ion
al.
Alaska This row alone would be time-series.
…
Utah
...
Panel Data--pooled cross sectional data in which the same cross section is sampled over
time. A well-known panel data set is the National Longitudinal Study. This study
surveys family expenditures of approximately 20,000 people.
c. Non-experimental--Experimental Data
Non-experimental data-typical in the social sciences.
Observations drawn from a system not subject to experimental control.
Experimental (common in natural sciences, but experimental data are becoming
more commonly used in economics)
examples: Physics/chemistry
Negative income tax (different tax rates, direct subsidies)
Health insurance
Influence of housing allowance
Split cable--different commercials
2. Data problems
a. Degrees of freedom
Not enough observations to estimate model (the number of observations must be greater
than the number of parameters)
10I
b. Multicollinearity--multicollinearity refers to the tendency of economic variables to
move together making it difficult to accurately estimate the impact of changes in
individual variables. This is often encountered in non-experimental data available in
the social sciences.
c. Measurement error and accuracy.
o Changing definitions of variables--government statistics: money, automobiles
(include station wagons?)
o Measurement Error--error boxes
o More accuracy reported than justified--[Story: Weigh hogs in Texas]
o Combining data with different accuracies—[Story: Age of river]
o Accuracy isn't necessarily symmetric--hence the errors need not "cancel" out
income tax reports—individual and corporate profits
women's age in surveys-- not many report ages between forty and forty five
3. Some data sources
Excellent websites include
http://www.ciser.cornell.edu/ASPs/datasource.asp and
http://www.econdata.net/.
Both of these websites provide access to a wide variety of data sources. Included in the
description of econdata.net is a list of the ten best sites based on user feedback. Some are
copied below for your convenience:
• Bureau of the Census
The Census Bureau site will lead you to the full range of popular and obscure Census
data series. The site has a comprehensive A-to-Z listing of data subjects, as well as
**American FactFinder** and CenStats, query-based means for accessing data for
your area from a variety of Census series.
• Bureau of Labor Statistics
Bureau of Labor Statistics (BLS) has a wealth of information available through its
Web site. BLS jobs, wages, unemployment, occupation, and prices data series are
available through a much improved query-based system. Also see Economy at a
Glance for an integrated set of BLS data for states and metro areas.
• Bureau of Economic Analysis
The Bureau of Economic Analysis (BEA) makes its Gross State Product, Regional
Economic Information System (REIS), and foreign direct investment data available
11I
on its Web site. You can also use this site to access BEA's national income account
data and its publication of record, the Survey of Current Business.
• http://www.econdata.net/
This website includes links to many different types of data, including some of the
following sites.
• http://www.Census.Gov
This site includes all data for the Census of Population and Housing and U.S. and
World Population data.
• http://www.census.gov. United Nations Statistical Division
• http://www.stls.frb.org [St. Louis Federal Reserve Economic Data Base]
Price indices, interest rates, balance of payments, employment, and monetary data.
• [Resources for Economists on the Internet]
U.S. macro and regional data, other U.S. data, international data, financial data, and
academic journal archive data.
• http://rfe.org (Resources for Economists)
• http://www.bea.doc.gov
The Bureau of Economic Analysis provides time-series data on a
variety of U.S. macroeconomic variables.
• http://www.psidonline.org
The Panel Study of Income Dynamics (PSID) is a nationally representative
longitudinal study of families and individuals begun in 1968. The initial focus
was to examine employment, earnings, and income over the life cycle for 5000
families. Interviews for many of these families and their descendents has
continued.
• http://www.icpsr.umich.edu
• http://www.icpsr.umich.edu/icpsrweb/ICPSR/
The Interuniversity Consortium for Political and Social Research (ICPSR)
provides access to an extensive collection of downloadable data. Try it, you may
like it.
• http://www.ipums.umn.edu
Integrated Public Use Microdata Series. Registration is free and registered users
can select “Create Extract” to choose variables to include in their data set.
• International—is an integrated series of census microdata samples from 1960 to
the present. At this time, the series includes eighty samples drawn from twenty-six
countries, with more scheduled for release in the future.
• USA- is an integrated series of representative samples drawn from the U.S.
censuses of the period from 1850 to 2000. IPUMS-USA also includes American
Community Survey (ACS) data from 2000 to 2005.
• CPS- provides integrated data and documentation from the March Current
Population Survey (CPS) from 1962 to 2006. The harmonized CPS data is also
compatible with the data from IPUMS-USA
Some other internet resources
• National Bureua of Economic Research
o http://www.nber.org/data/
12I
• Another excellent data site which has data to explore the impact of religious
practices on the family is
http://www.people.cornell.edu/pages/jpp34/religion_datasets.htm
• For those interested in sports data, try espn.com, pgatour.com, nba.com, basketball-
reference.com, hoopdata.com
• For those considering purchasing a diamond, you might try www.diamonds.net
•
•
DataFerrett is a popular data mining tool that accesses data stored in TheDataWeb through
the internet. DataFerrett can be installed as an application on your desktop or use a java applet
with an internet browser. DataFerrett is compatible with Windows operating systems.
http://dataferrett.census.gov/
• National Center for Health Statistics
• National Retirement Survey
Google is also an excellent resource to assist in locating data and studies related to your area
of interest.
C. Econometric Projects
The purpose of the project is to provide an opportunity to formulate a model of interest,
collect relevant data, estimate the model and interpret the results. This experience will
facilitate an integration of the statistical and econometric methodologies discussed in class
with other economics courses which may focus more on institutional descriptions of events
and organizations or an analysis of theoretical models. These models are merely
hypothesized explanations of observed economic data and should be estimated and tested.
Econometrics provides a method of testing the validity of the hypotheses underlying
economic models.
1. Model Selection and Data
The selection of a model and data to be used are the first steps in an econometric
project. Other economics courses or related journal articles may provide a source of
interesting models. The determination of an econometric project should be based on both an
interesting model and available data. A common problem encountered with econometric
projects is the unavailability of relevant data. Some helpful data sources are contained in the
section I.B.3 of the notes. A growing number of journals provide data used in published
articles. Replicating and updating the research in a published paper can be a productive
exercise. Alternatively, you might consider selecting a project related to your future career
aspirations, a unique data source to which you have special connections, or a passion you
have long held. A pre-med student used epidemiology data he was already working on with
13I
a professor from the Microbiology Department. A pre-law student studied the determinants
to law school rankings. A BYU basketball player studied the impact of various statistics on
total BYU points scored. A student working for a direct-sales company used Census data to
predict what counties would be most successful for his company. Another student had a job
in the energy industry and built a model predicting natural gas prices. One approach is to
think about topics that would be good talking points in future job interviews. Previous
topics have truly been very diverse in terms of both topic and scope. Some more examples:
• Determination of factors related to admission to medical school (one student wrote
the admissions committee and requested anonymous data, one student’s father was
the president of a college)
• The relationship between the value of diamonds and cut, color, and clarity (one
student found an online database of diamond prices and characteristics)
• Factors best determining the probability of divorce (one student used IPUMS.org,
one student obtained the data from a BYU MFHD professor he had)
• Interplay between state hunting licenses and state deer population (student requested
data from Minnesota State Hunting Department)
• Financial applications such as estimating betas of stocks (students have used
Marriott School resources, such as Bloomberg and Compustat)
• Production functions
• Phillips Curve (students have used publicly available unemployment and inflation
data)
• Prediction of consumer default on loans
• Estimating the likelihood of medical doctors to commit suicide (student used
DataFerret to access National Center for Health Statistics microdata)
• Impact of foreign aid on national stability and economic development (one student
had done research with a Political Science professor that provided him with the
development data, one student’s sister was working for an international aid NGO)
• Determinants of profit in used car sales (student used his roommate’s dad’s
dealership’s proprietary data)
• Relationship between consumer debt, credit ratings, and demographics (student used
American FactFinder for demographic data and used credit ratings from the small
business he worked for)
14I
• Impact of weather, daylight savings time, advertising and local events on retail sales
(one student requested sales data from his boss at a local store, another asked his
brother for sales and advertising data from his startup restaurant)
Once a topic has been selected you should review the previous literature on the topic. A
computer literature search will be helpful. Google Scholar is a useful starting point. Once
you find some good papers that deal with your topic, it is often useful to follow their
citations to identify other relevant literature. In specifying your model, you should clearly
identify the endogenous (dependent) variables to be explained as well as the exogenous
(independent) variables in your model. If you are replicating a previously published
empirical study, it would also be interesting to update the analysis. For economics 388 you
may want to restrict the model to explain one or two endogenous variables. For economics
588, four endogenous variables is a reasonable upper limit with at least six or eight
exogenous variables. If you are working with a simultaneous equations model, both the
structure and reduced form parameters should be estimated.
2. Model Estimation
For single equation models or reduced form representations, ordinary least squares can
be used if neither autocorrelation nor heteroskedasticity is present. Multicollinearity makes
it difficult to obtain accurate estimates of the effects of individual variables. Improved
estimation procedures are available if either autocorrelation or heteroskedasticity is present.
Simultaneous structural equation models are better treated with estimation techniques
specifically developed for these models. The most widely used of these techniques is
probably two stage least squares or instrumental variables estimation. Alternative methods
are also available for structural models and will be discussed in economics 588.
Ordinary least squares, two stage least squares, instrumental variables, and many other
estimators are available in such computer packages as SAS, Stata, SHAZAM, SPSS,
EVIEWS, RATS, TSP, Matlab, Gretl,and R, to mention only a few. Gretl and R are free.
15I
3. Organization of the write-up
The format for your paper should be modeled after that required by scholarly refereed
journals and would include:
(a) Title page
(b) Abstract. This should be less than one page in length and summarize the topic,
methodology and findings.
(c) Introduction. This section should state the nature and objectives of the project along
with a review of the relevant literature.
(d) Description of the model. The model should be defined and each equation carefully
explained. The variables should be clearly defined. The expected impact of each
exogenous variable on the dependent variable and the reasons explained, i.e., discuss
the comparative statics of the model.
(e) Interpretation of the variables and estimated model. The interpretation of the variables
and data references should be included in the paper. Also include a copy of the data or
references to the data. Basic statistical descriptions for the variables, such as the mean,
variance, minimum, and maximum should be summarized in a table. The results of
estimating the model should be reported and discussed in this section and would
include: parameter estimates, standard errors, t-statistics, F-statistics, R2, tests for
normality, autocorrelation, heteroskedasticity and possibly the degree of
multicollinearity.
(f) Economic analysis of the estimated model and implications. This section would include
a comparison of the estimated results with the comparative static implications of the
economic model. Policy implications, if any, and the predictive capability of the model
could also be included in this section.
(g) Summary and conclusions. Review the major findings as well as possible future work.
(h) Bibliography. Include complete citations for all references in the paper including data
sources.
(i) Include copies of your data in an appendix or give a complete citation to the data
sources. This facilitates a replication of your work which is an important component of
scientific research.
16I
D. Problem set
Intro Problem Set
Introduction and Stata
Theory
1. Consider the labor model
Demand: w = 100 - 5N
Supply: w = 50 + 5N
where w denotes the wage rate and N denotes the number of individuals.
a. Graph these schedules and solve for the equilibrium wage and employment level.
b. Graphically depict the effect of imposing a minimum wage of w = 80. What is the
associated level of unemployment?
(JM)
2. Now consider the demand and supply schedules:
Demand: w = β1 - β2N
Supply: w = γ1 + γ2N
a. Demonstrate that the equilibrium wage rate ( w ) is given by
βγ
γββγ
22
1212
+
+ = w
b. Demonstrate that the level of unemployment associated with the imposition of a minimum
wage rate of w + 10 is given by
.1
+ 1
1022
βγ
(Hint: What is the level of unemployment at w ?)
c. What is the importance of knowing the values of the structural parameters for policy
implications?
(JM)
3. Assume the demand for gasoline is given by Qd = β1 - β2Pg and the supply of gasoline is
given by Qs = 100 + 10Pg - 2Pc where Q, Pg, and Pc denote the quantity gasoline, the price of
gasoline and the price of crude oil.
a. Obtain an expression for the equilibrium price of gasoline ( gP ) in terms of β1, β2, and
Pc.
17I
b. Evaluate the effect that an increase in Pc of 10 units will have upon the equilibrium
price of gasoline. Do the values of β1 and β2 have any effect on the magnitude of the
effect?
(JM)
4. Application in Stata
There are two ways to execute commands in Stata: writing a simple program file of commands
(called do-files) or entering in each command one at a time into Stata’s command line prompt.
We will use the latter method here, but you are encouraged to learn how to use do-files. They
are especially useful when you want to be able to replicate results several times, such as for
your projects.
First we enter in the data. Open up Stata, type in “edit” and hit enter.
Stata’s Data Editor should appear. Starting with the top left cell, enter in the data below, in
two columns:
This represents students’ GPAs along with the corresponding level of
parental income in thousands of dollars. The first student, for example, has a
3.9 GPA and comes from a family having an annual income of $ 75,000.
Close the data editor by clicking on the X in the top right corner. Stata has
saved your data and automatically named the two columns “var1” and “var2”
respectively. You can see them in the Variables window in the top left. Let’s
make sure that the data is as we want it.
Type “list” and hit enter. You should see a little table listing the data you have just entered.
Since “var1” and “var2” are vague variable names, let’s rename them.
Type in “rename var1 gpa” and hit enter. Then type in “rename var2 income.” Now when
you type in “list” you will see new variable names.
To see summary statistics for the two variables, use the summarize command: “summarize gpa
income.” (You can also just type “summarize” and Stata will summarize all of the variables
in memory.)
To see a scatter plot of the two variables with gpa on the y-axis and income on the x-axis, use
the plot command: “plot gpa income” (In Stata the dependent variable always goes first in a
list).
To run a simple linear regression showing the estimated effect of parental income on GPA,
use the regress command: “regress gpa income.”
To generate a new variable equal to the square of income, use the generate command:
“generate incomesq = income^2”. Use the list command again to look at a table of all three
variables.
Print the Stata output to turn in with this assignment (either using File… Print, or by copying
the output to a text editor like Notepad).
3.9 75
4.0 63
3.0 45
3.5 45
2.0 27
3.0 36
3.5 54
2.5 18
2.5 24
18I
*For most Stata commands, you don’t have to type out the entire command word. For
example, for generate instead of typing out “generate” you can use “g” “ge” or “gen”.
*You may have Stata keep a log of your results for you using the log command. At the
beginning of your Stata session, type “log using mynewlog” where “mynewlog” is the name of
your log file. Stata will open a new log in the “working directory.” To find out where the
working directory is, use the call directory command by simply typing in “cd” and hitting
enter. When you are done using the log and before exiting the program, close the log by
typing in “log close.”
5. Select a data website such as http://www.oswego.edu/~kane/econometrics/data.htm, select
two variables, calculate the means and variances, and plot the observations on the two
variables.
II 1
James B. McDonald Brigham Young University
5/2010 II. TWO VARIABLE LINEAR REGRESSION MODEL
Several applications about the importance of having information about the relationship
between economic variables were illustrated in the introduction. This section provides some essential building blocks used in estimating and analyzing "appropriate" functional relationships between two variables. We first consider estimation problems associated with linear relationships. The properties and distribution of the least squares estimators are considered. Diagnostic and test statistics which are important in evaluating the adequacy of the specified model are then discussed. A methodology for forecasting and the determination of confidence intervals associated with the linear model is presented. Finally, some alternative functional forms (nonlinear) which can be estimated using techniques of regular least squares are presented. A. INTRODUCTION
Consider the model
Yt = β1 + β2Xt + εt
with n observations (X1,Y1), . . ., (Xn,Yn) which are graphically depicted as
ε t: true random disturbance or
error term
(vertical distance from the observation to the line)
• Random behavior
• Measurement error (Y)
• Omitted variables
β1 + β2Xt: population regression line
• β1 and β2 are unknown
II 2
Population Regression Function:
The observations don't have to lie on the population regression line, but it is usually
assumed that
E(Yt | Xt) = β1 + β2Xt, i.e.,
the expected value or the "average" value of Y corresponding to any given value of X lies on the population regression line.
An important objective of econometrics is to estimate the unknown parameters (β1, β2),
and thereby estimate the unknown population regression line. This estimated regression line is referred to as the sample regression line. Again, the sample regression line is an estimator of the population regression line.
Sample Regression Function:
et (the residual) is the vertical distance from the Yt to the sample regression line, so
t t 1 2 t t tˆ ˆ ˆe Y X Y Y= −β −β = − , whereas t t 1 2 tY Xε = −β −β
It is important to recognize that the residual (et) is an estimate of the equation error or
random disturbance (εt) and may have different properties.
1 2
observed estimated randomY disturbance orregression
"residual"line
estimated Yfor a given X
ˆ ˆ
ˆ
t t t
t t
Y X e
Y e
β β= + +
= +
14243sample
1 2
observed error orpopulationY randomregression
disturbanceline
t t tY Xβ β ε= + +14243
II 3
B. THE ESTIMATION PROBLEM
(1) Given a sample of (Xt,Yt): (X1,Y1), . . ., (Xn,Yn),
Yt
. . . . . . _____________________________ Xt
(2) estimate β1, β2 , ( )1 2ˆ ˆ,β β .
Note that each different guess of β1 and β2, i.e., 1β and 2β , gives a different sample
regression line. How should 1β and 2β be selected? There are many possible approaches
to this problem. We now review five possible alternatives and then carefully develop a method known as least squares.
Criteria: (five of many)
(1) minimize "vertical" distances
min Σ et no unique solution
1β and 2β
min Σ e 2
t least squares or ordinary least squares (OLS)
1β and 2β
(2) min Σ et p robust estimators
1β and 2β
p=2 gives least squares p=1 gives least absolute deviations (LAD)
(3) min Σ (horizontal distances)2
1β and 2β
(4) min Σt (perpendicular distances from regression line)2
1β and 2β
II 4
(5) Method of moments (MM) estimators Sample average of estimated residuals = E(εt) = 0
0 = e t
n
1=t
∑
Sample covariance between residual and X = E(εtXt) = 0
0 = Xe tt∑
The solution of these equations yields OLS estimators
Many techniques are available and each may have different properties. We will want to use the best estimators. One of the most popular procedures is least squares.
Derivation of Least Squares Estimators (OLS)*
The sum of squares of the vertical distances between Yt and the sample regression line is called, by many authors, the sum of squared errors and is denoted SSE. The SSE can be written as
( )2
2
t t 1 2 tˆ ˆSSE = e = Y -β -β X∑ ∑
Different β 's (sample regression lines) are associated with different SSE. This can be
visualized as in the next figure. Least squares amounts to selecting the estimators with the smallest SSE.
____________ *Since the SSE involves squaring the residuals, least squares estimators may be very sensitive to "outlying" observations. This will be discussed in more detail later.
II 5
Minimizing SSE with respect to β 1 and β 2 yields
Proof: In order to minimize the SSE with respect to β 1 and β 2, we differentiate SSE,
with respect to β 1 and β 2, yielding:
(-1))Xˆ - ˆ - Y(2 = ˆ
SSE (1)t21t
t1
βββ∂
∂∑
e 2- = t
t
∑
β1
β2
SSE
( )
( )( )
( )( )( )
1 2
t tt2 2 2
tt
t t2
t
ˆ ˆY - X (the sample regression line goes through X,Y )
X Y nXYˆ
X nX
X X Y Y
X X
Cov(X,Y) Var(X)
β = β
−β =
−
− −=
−
=
∑∑
∑∑
II 6
)X(-)Xˆ - ˆ - Y(2 = ˆ
SSEt
(1)t21t
t2
βββ∂
∂∑
)Xˆ - X ˆ - X Y(2- = 2t2t1tt ββ∑
.Xe 2- = tt∑
We see that setting these derivatives equal to zero,1 2
SSE SSE = 0 and = 0
ˆ ˆβ β
∂ ∂
∂ ∂, implies
These two equations are often referred to as the normal equations. Note that the normal equations imply that the sample mean of the residuals is equal to zero and that the sample covariance between the residuals and X is zero which were also the conditions used in method of moments estimation.
Solving the first normal equation for β 1 yields
which implies that the regression line goes through the point ( X,Y ). The slope of the
sample regression line is obtained by substituting 1 2ˆ ˆY Xβ = − β into the second normal
equation tt
2
SSE = 0 or = 0e X
β
∂∑ ∂
and solving for β 2. This yields
1 2ˆ ˆY Xβ = − β
t t
t2 22t
t
( Y X nXY)ˆ
( X nX )
Cov(X,Y) Var(X)
−β =
−
=
∑∑
n
t
t=1
n
t t
t=1
e = 0
e X = 0.
∑
∑
II 7
C. PROPERTIES OF LEAST SQUARES ESTIMATORS
The properties of the β 1 and β 2 derived in the previous section will be very sensitive to
which of the following five assumptions are satisfied:
(A.1) εt are normally distributed
(A.2) E(εt Xt) = 0
(A.3) Homoskedasticity:
Var(εtXt) = 2 2
tσ = σ for every t
Homoskedasticity Heteroskedasticity
(A.4) No Autocorrelation:
Cov(εt, εs) = 0 t ≠ s
II 8
(A.5) The X's are nonstochastic (fixed in repeated sampling) and
Var(X) is finite, or in other words: 2
1
0 lim ( )n
tn
t
X X→∞
=
< − < ∞∑ .
(This assumption can be relaxed, but the X’s need to be uncorrelated with
the errors in order for OLS estimators to be unbiased and consistent.)
A linear model satisfying (A.2)-(A.5) is referred to as the classical linear regression model. If (A.1)-(A.5) are satisfied, then we have the classical normal linear regression model. We will now summarize the properties of the least squares estimators in each of these two cases.
1. The Classical Linear Regression Model (A.2 – A.5)
If Yt = β1 + β2Xt + εt
where (A.2)-(A.5) are satisfied, then the iβ ’sare
⋅unbiased: ( )ˆi i
E β β=
⋅consistent: Var( β i) → 0 as n → ∞
⋅the minimum variance of all linear unbiased estimators.
⋅These estimators are referred to as BLUE--best linear unbiased estimators.
⋅ (A.2)-(A.5) are known as the Gauss-Markov Assumptions.
2. The Classical Normal Linear Regression Model (A.1 – A.5)
If Yt = β1 + β2Xt + εt
where (A.1)-(A.5) are satisfied, then the least squares estimators are:
⋅unbiased
⋅consistent
⋅minimum variance of all unbiased estimators (not just linear estimators)
⋅normally distributed This result facilitates t and F tests which will be discussed in another section.
⋅least squares estimators will also be maximum likelihood estimators.
Since these desirable properties are conditional on the assumptions, it is important to test for their validity. These tests will be outlined in another section of the notes.
We now attempt to give some intuitive motivation to the concept of maximum likelihood estimation, then we prove that least squares are maximum likelihood estimators if (A.1)-(A.5) are valid.
II 9
a. Pedagogical examples of maximum likelihood estimation: (1) Estimation of µ (population mean)
The observed values of a normally distributed random variable Yt are denoted by (Yt's) on the horizontal axis. Assume that we know that these data were generated by one of two populations (#1, #2). Is it possible that the data were generated from #1?, from #2? Which is the "most likely" population to have generated the sample?
(2) Regression models
In this example, which of the two population regression lines is most likely* to have generated the random sample?
II 10
*It might be useful to think about these “pdf’s” as “coming out” of the page in a third dimension with the “points” being thought of as being normally distributed around the population regression line.
b. Maximum likelihood estimation--Derivation
How can we quantify the ideas illustrated by these two examples and obtain the "most likely" sample regression line? We now formally derive the maximum likelihood estimators of β1 and β2 under the assumptions (A.1)-(A.5).
For the model
Yt = β1 + β2Xt + εt
(1) E(Yt) = β1 + β2Xt
(2) Var(YtX) = Var(β1 + β2Xt + εtXt) = σ2;
hence, we can write Yt ~ N[β1 + β2Xt; σ2] which means that the density of Yt, given
Xt, is given by f(YtXt) = . 2
e =
2
2/)X--Y-( 22t21t
σπ
σββ
These results can be visually depicted as in
the following figure:
II 11
The Likelihood Function for a random sample is defined by the product of the density functions. Since each density function gives the likelihood or relative frequency of an individual observation being realized, when we multiply these values, we obtain the likelihood of observing the entire sample, given the current parameters:
L(Y;β1,β2,σ2) = ( ) ( )1 nf Y f YL
=)()(2
e2n/22n/
2/)X--Y(- 22t21t
σπ
σββ∑
and the Log Likelihood Function is given by:
l (Y;β1,β2,σ
2) = ln L(Y;β1,β2,σ2)
= Σt ln f(Yt)
.ln 2
n - )ln(2
2
n - 2/)X--Y( - = 222
t21t
t
σΠσββ∑
( )2 2n n= -SSE/ 2 ln(2 ) - ln
2 2− πσ σ
Maximum Likelihood Estimators (MLE) are obtained by maximizing l (Y; β1, β2, σ
2)
over β1, β2, and σ2. This maximization requires that we solve the following equations:
0 = SSE
2
1- = (1)
12
1 β∂
∂
σβ∂
∂l
0 = SSE
2
1- = (2)
22
2 β∂
∂
σβ∂
∂l
0 = ˆ
1
2
n - )ˆ(
2
SSE = (3)
2
2-2
2σ
σσ∂
∂l
LogL
β1
β2
1
2
II 12
Results:
• β 1 and β 2 (the MLE) are also the OLS estimators β1 and β2 when (A.1) – (A.5).
• ( )
22
t 1 22 t
ˆ ˆYeˆ
n n
− β − β σ = =
∑ ∑
= average of square vertical deviations is the MLE of σ2
• 2σ is biased.
s2 = Σet2/(n - 2) is an unbiased estimator of σ2. The reason 2σ is biased is that
not all of the et's are independent. Recall that there are two constraints on the
et's:
Σet = 0
ΣetXt = 0;
hence, (n – 2) of the residuals (estimated errors) are independent. In other words, if we had (n-2) of the et's, we could solve for the remaining two using the two constraints above.
3. Important observation:
If the assumptions (A.1) - (A.5) are not satisfied, we may be able to "do better" than least squares. It is important to test
the validity of (A.1) - (A.5).
II 13
i
2ˆi i
ˆ ~ N ;β
β β σ
D. DISTRIBUTION OF 1β AND 2β .
1. Distribution
In this section we give, without proof, the distribution of the least squares estimators if (A.2)-(A.5) hold. We also consider factors impacting estimator precision and finally provide some simulation results to provide intuition to the distributional results. The main results are then summarized. The proofs will be given in the next chapter using matrix algebra.
1β and 2β are linear functions of the 't
Y s are random variables; hence, 1β and 2β are
random variables.
Expected Value: (unbiased estimators)
E( 1β ) = β1
E( 2β ) = β2
Variance (Population)
2
222 2
ˆ t = / ( - X = )Xn (X)Var
β
σσ σ ∑
( )1
222 2ˆ t = 1/n + / ( - X)XXβ ∑σ σ
σσ β2ˆ
22
2X +/n =
1β and 2β are consistent because they are unbiased and their variances approach zero as
the sample size increases. Furthermore, if (A.1) holds (εt ~ N(0, σ2)), then Yt ~ N[β1+β2Xt;σ
2], which implies the
iβ 's will be normally distributed since they will be linear combinations of normally
distributed variables.
These results can be summarized by stating that if (A.1)-(A.5) are valid, then
where the equations for the variances are given above.
II 14
2. What factors contribute to increased precision (reduced variance) of parameter
estimators?
Consider the density of β 1 and recall that
1
2222 2 2
ˆ t
1 1 X = ( + / ( - X ) = + .)XXn n n (X)Var
βσ σ σ
∑
Precise Less Precise
Var(X)
n
σ
II 15
3. Interpretation of β i ~ N[βi; ]2ˆ
iσβ using Monte Carlo Simulations
In this section we report the results of some Monte Carlo simulations which provide
additional intuition about the distribution of iβ . We first construct the model used to
generate the data and then generate the data. Parameter estimates are then obtained, another sample is generated and the process is continued until we can consider the histograms of the estimators. Most Monte Carlo studies are similar in structure.
Consider the simple model which is referred to as the data generating process (DGP)
Yt = β1 + β2Xt + εt
= 4 + 1.5Xt + εt
where εt ~ N(0, σ2 = 4). We will let the X's be given by
Xt = 1, 2, . . ., 20. The selection of 1β , 2β , 2σ , and the X’s are arbitrary.
We then generate 20 random disturbances (ε) using a random number generator for
N(0, σ2 = 4).
The X's and ε's are then substituted into
Yt = 4 + 1.5Xt + εt
to determine corresponding Y's. We now have 20 observations on Xt and Yt.
Pretend that we don't know what β1, β2, σ
2 are. The only thing we observe are the (Xt,
Yt). This might be visualized as
X → β1, β2, σ2, ε → Y
We now estimate the unknown parameters (β1, β2, σ
2) using the previously discussed
formulas. This could yield, for example:
( β 1, β 2, σ2) = (3.618, 1.615, 2.499).
If 14 more samples were generated, we would have a total of 15 estimates of β1, β2, σ
2.
II 16
The results of these random simulations are given by:
Trial β 1 1
2
βs β 2
2
2
βs s2 R2 D.W.*
________________________________________________________________________ 1 3.618 .539 1.615 .00372 2.499 .974 2.14 2 3.794 .992 1.494 .00689 4.599 .947 2.32 3 5.770 .826 1.346 .00578 3.838 .946 2.10 4 3.491 .646 1.516 .00449 2.997 .966 2.41 5 4.443 .566 1.438 .00397 2.623 .967 2.20 6 4.697 .968 1.491 .00672 4.486 .948 2.83 7 5.428 .504 1.363 .00348 2.333 .967 2.40 8 4.685 .923 1.394 .00672 4.278 .944 1.73 9 6.122 .653 1.337 .00449 3.025 .956 2.21 10 2.589 .885 1.624 .00624 4.100 .960 1.63 11 4.046 1.447 1.514 .01000 6.707 .927 3.35 12 4.384 1.362 1.488 .00941 6.314 .928 1.32 13 3.452 .797 1.594 .00563 3.693 .962 2.06 14 4.301 .598 1.495 .00423 2.770 .968 1.51 15 3.196 .910 1.566 .00640 4.221 .955 2.17 Average 4.27 .8411 1.485 .0059 3.8989 .954 2.16 *D.W. denotes Durbin Watson statistic which can be used to test the validity of (A.4).
Given that ( )2
1
n
t
t
X X=
−∑ = 665.
Questions:
(1) Evaluate the population variance of β 1 and β 2; i.e., . , 2ˆ
2ˆ
21σσ ββ
(2) Compare the average of 1
2
βs and
2
2
βs with their population counter-parts obtained in (1).
(3) Evaluate the sample variance of the fifteen estimates of β 1 and β 2 and compare them
with their population counterparts.
(4) Use a chi-square test to determine whether the average of the s2's is consistent with
σ2 = 4. Hint:
22
2
n- 2 ~ (15(18) = 270)s
∑ χ
σ .
II 17
A histogram of the estimated 1β 's might yield a result similar to the following:
Note the relationship between the histogram and the normal density .) ,N( 2ˆ1 1
σβ β
In practice we only have one sample of X's and Y's; hence, we only have one
observation of 1β , 2β , σβior sˆ
iβ and these distributional results must be interpreted
accordingly.
4. Review:
Model: Yt = β1 + β2Xt + εt
A.1 εt is distributed normally
A.2 E(εtXt) = 0
A.3 Var(εt) = σ2 ∀t
A.4 Cov(εtεs) = 0 t ≠ s
A.5 The X's are nonstochastic and 2
1
0 lim ( )n
tn
t
X X→∞
=
< − < ∞∑ .
Unknown parameters: β1, β2, σ
2
Problem: Given a sample of size n: (X1,Y1), . . ., (Xn,Yn), obtain estimators of the unknown parameters.
Estimators of the unknown parameters are given by:
1β
4
II 18
Parameter Estimator
β1: β 1 = Y - β 2 X
β2: )X - X(
)Y - Y)(X - X( = ˆ
2t
tt
2 ∑
∑β
Var(X)
Y)Cov(X, =
Xn - X
YXn - YX =
22t
tt
∑
∑
σ2: 2 -n
)Xˆ - ˆ - Y( = 2) -/(n e = s
2t21t2
t2 ββ∑
∑
Distributions:
1
222 2 2ˆ t11
ˆ ~ N[ , = /n + / ( - X ])XXβ ∑β σ σ σβ
2
22 2ˆ t22
ˆ ~ N[ , = / ( - X ])Xβ ∑β σ σβ
The covariance between β 1, and β 2 is given by
)X - (X/X- = )ˆ var(X- = X- = 22
2
2ˆˆˆ
221∑σβσσ βββ and will be proven later.
The σβ2ˆ
iare estimated by
)X - X(/sX + n
s = s
2t
222
2ˆ1
∑β
.)X - X(/s = s2
t22
ˆ2
∑β
It should be mentioned that
1 2
1 2
22 22ˆ ˆ t t 21 2
2 2 2 2ˆ ˆ
ˆ ˆ(n- 2) (n- 2) ( - - )s s(n- 2) Y Xs = = = ~ (n- 2)
β β
β β
∑ β βχ
σ σ σ σ
II 19
E. DESCRIPTIVE STATISTICS AND HYPOTHESIS TESTS
In this section we assume that (A.1)-(A.5) are valid and consider test statistics which can
be used to test whether the model has any explanatory power. Z and t statistics and R2 (the
coefficient of determination) are important tools in this analysis. An important hypothesis is
whether the exogenous variable X helps explain Y. Normally, we would hope to reject the
hypothesis H0: β2=0 (Yt=β1+εt). We also consider how to test more general hypotheses of the
form H0: βi=β0
i .
1. , = :H0
ii0 ββ where σβ2ˆ
iis known
i
0
ii
ˆ
ˆ - Z = ~ N(0,1)
β
ββ
σ
The test statistic measures the number of standard deviations that iβ differs from the
hypothesized value. Large values provide the basis for rejecting the null hypothesis. The
critical value is 1.96 for a two tailed test at the 5% level.
2. , = :H0
ii0 ββ where σβ2ˆ
iis unknown
ii
0 0
i ii i
2ˆˆ
ˆ ˆ - - t = = ~ t(n - 2)
ss ββ
β ββ β
Note the structure of the t-statistic and the Z-statistic are the same, except the standard
error in the Z-statistic is replaced by an unbiased estimator. sβ i
would, in some sense, get
closer to σβ i
as the sample size increases. We see this as we compare critical values for
the t- and Z-statistics.
II 20
Relationship between t- statistics and the standard normal
90% 95% 99% N(0,1) 1.645 1.960 2.326 t(1) 6.314 12.706 31.821 2 2.920 4.303 6.965 3 2.353 3.182 4.541 4 2.132 2.776 3.747 10 1.812 2.228 2.764 25 1.708 2.060 2.485
∞ 1.645 1.960 2.326 = N(0,1)
Note that the critical values for a t-statistic are larger than for a standard normal, because
the t density has thicker tails.
II 21
Confidence Intervals and t-statistics:
We note, from the following, the close relationship between the t-statistic just discussed
and confidence intervals.
)t < s
- ˆ < tPr(- 2/
ˆ
0
ii2/
i
α
β
α
ββ
)st + ˆ < < s t - ˆPr( = ˆ2/iiˆ2/1 ii βαβα βββ
= 1 - α
Thus, the use of confidence intervals or "test statistics" are just two different ways of looking at the same problem.
II 22
3. Coefficient of Determination (R2)
The coefficient of determination measures the fraction of the total sum of squares "explained" by the model. The following figure will provide motivation and definition of important terms.
Define the total sum of squares (SST) to be
)Y - Y + Y - Y( = )Y - Y( = SST2
ttt2
t
t
∑∑
+ )Y - Y( + )Y - Y( = 2
t
2
tt ∑∑ cross products = 0 if
least squares is used
)Y - Y( + e = 2
t2t ∑∑
= SSE + SSR,
where SSE and SSR, respectively, denote the sum of squared errors and sum of squares
explained by the regression model.
• total sum of squares = sum of squared errors + sum of squares "explained"
by regression model.
• SST = SSE + SSR
The coefficient of determination (R2) is defined by
SST
SSE - 1 =
SST
SSR = R
2
t t tˆe Y Y= −
tY Y−
t 1 2 tˆ ˆY X= β + β
II 23
)Y - Y(
e - 1 =
2t
2t
∑
∑= fraction of total sum of squares "explained" by the model.
Note that increasing the number of independent variables in the model will not change SST,
but will decrease the SSE as long as the estimated coefficient of the new variable(s) is not
equal to zero; hence, increase R2. This is true even if the additional variables are not
statistically significant. This has provided the motivation for considering the adjusted R2 ( 2R )
instead of R2. The adjusted 2R is defined by
1) -/(n )Y - Y(
K)/(n-)e( - 1 = R 2
t
2t2
∑
∑
where K = the number of β's (coefficients) in the model. R 2 will only increase with the
addition of a new variable if the associated t-statistic is greater than 1 in absolute value. This
results follows from the equation
( )( )
_ var
22
_ var2 2
ˆ
ˆ 0( 1)1
1New
NewNewNew Old
n SSER R
n k n K SST sβ
β −− − = −
− − −
where the last term in
the product is ( )2 1t − and K denotes the number of coefficients in the “old” regression model
and the “new” regression model includes K+1 coefficients.
4. Analysis of Variance (ANOV)
We have just decomposed the total sum of squares (SST) into two components:
• sum of squares error (SSE)
• sum of squares explained by regression (SSR).
This decomposition is commonly summarized in the form of an analysis of variance
(ANOV) table.
Source of Variation
SS
d.f
MSE
Model Error
SSR SSE
K - 1 n – K
SSR/(K-1) SSE/(n - K)
Total
SST
n – 1
K = number of coefficients in model
II 24
where SS denotes the sum of squares and degrees of freedom, d.f., is the number of independent terms in SS. The mean squared error (MSE) is the corresponding sum of squares (SS) divided by the degrees of freedom.
Dividing the MSE for the model by the MSE for the error (s2) gives an F-statistic:
KSSE/n-
1)SSR/(K- = F
2
2
n- K R= ~ F(K - 1, n - K)
K-1 1- R
The F-statistic can be used to test the hypothesis that all non-intercept (slope) coefficients
are equal to zero.
In the case of a single exogenous variable,
t 1 2 t tY = β X+ β + ε
the F statistic ( )2
2
n-2 R~ F 1, n 2
1 1-R
−
tests the hypothesis 0H : β2 = 0 (all non-intercept coefficients = 0).
II 25
5. Sample Stata regression output (general format and a numerical example)
sum lwage educ Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------
lwage | N sample mean lwage
s smallest value largest value
educ | N sample mean educat
s smallest value largest value
. reg lwage educ
ANOVA (Analysis of Variance Table)
Source | SS df MS Number of obs = N -------------+------------------------------ F( #coef-1, N-#coeff) = Model | SSR #coef-1 SSR/(#coeff-1) Prob > F = 0.0000 Residual | SSE N-#coef SSE/(N-#coeff) R-squared = SSR/SST = 1- SSE/SST
-------------+------------------------------ Adj R-squared = /( # )
1/( 1)
SSE N coeff
SST N
−−
−
Total | SST N-1 SST/(N-1) Note: 2 2, R
#
SSEs MSE s s
N coeff= = =
−
Regression results
------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -----------------------------------------------------------------------------
educ | 2β 2
ˆsβ
2
2
ˆ
ˆ
s β
β
Probability of a larger t stat. ( )ˆ/ 2ˆ /
ii t sα β
β + −
_cons | 1β 1ˆs
β 1
ˆ1
ˆ
s β
β
Same as above
sum lwage educ Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- lwage | 428 1.190173 .7231978 -2.054164 3.218876 educ | 753 12.28685 2.280246 5 17 . reg lwage educ Source | SS df MS Number of obs = 428 -------------+------------------------------ F( 1, 426) = 56.93 Model | 26.3264193 1 26.3264193 Prob > F = 0.0000 Residual | 197.001022 426 .462443713 R-squared = 0.1179 -------------+------------------------------ Adj R-squared = 0.1158 Total | 223.327441 427 .523015084 Root MSE = .68003 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .1086487 .0143998 7.55 0.000 .0803451 .1369523 _cons | -.1851968 .1852259 -1.00 0.318 -.5492673 .1788736
II 26
F. FORECASTS
If we have determined that our model has significant explanatory power, we may want to use it
to obtain predictions. We turn to constructing predictions or forecasts and confidence intervals
for the (1) regression line (or mean Y corresponding to a given X) and (2) individual value of
Y corresponding to an arbitrary value of X.
Sample: (Xt, Yt), t = 1, 2, . . ., n
Estimators: β 1, β 2
Sample Regression Line: Y t = β 1 + β 2Xt
Uncertainty about β 1, β 2 implies uncertainty about Yt.
E( Y t) = β1 + β2Xt
σσσ ββββ ˆˆt2ˆ
2t
2ˆt 2121
X2 + X + = )YVar(
σσσσ
βββ2
)ˆt2ˆ
2t
2ˆ
22
212X(-X2 + X + X +
n =
σσ
β2ˆ
2t
2
2)X - X( +
n =
σ2
Yt =
Therefore,
β β σt
2ˆtt 1 2 Y
ˆ ~ N( + ; ).XY
σ2
Ycan be estimated by s)X - X( +
n
s = s
2ˆ
2t
22
Y 2β
From these results we can construct
sample period
II 27
Confidence Intervals for β1 + β2Xt: (regression line or E(YX))
The forecasting problem is more often concerned with finding confidence intervals for the
actual value of Yt (not E(YtXt)) rather than the “mean” or expected value Yt corresponding to
an arbitrary value of Xt. To do this we consider an analysis of the forecast error (FE):
FE = Yt - tY
E(FE) = 0
σFE2 = Var(FEX)
= Var(Yt) + Var( Y )
= σ σ2 2
Y +
due to due to the error uncertainty about term population regression line
with σFE2 being estimated by sFE
2 = 2
Ys ˆ + s2
Note that Y
s ˆ and sFE are functions of ( )2
X X− , i.e., the further X is from the mean value, the
larger Y
s ˆ and sFE. This can also be seen in the following figure.
t c Y
1 2 t c Y
Y t s
X t s
±
β + β ±
ˆ
ˆ
ˆ
ˆ ˆ
where tc = tα/2(n-2).
II 28
Confidence Intervals (CI) for actual Yt: (not β1 + β2Xt)
where 2FE FE = s s
s + s = 2
Y
2
2
22 22
ˆts
= + + ( - X ) sXsn
β
The two curved lines closest to the sample regression line correspond to CI’s for the population
regression line and the two curved lines furthest from the sample regression line are the CI’s for the actual value of Y corresponding to different values of X.
G. ESTIMATION USING Stata
These calculations can be very tedious for even moderate sample sizes. Fortunately,
calculators and many computer programs make this part of econometrics relatively painless,
even exciting. Thus, we will be able to focus on understanding the statistical procedures, the
validity of the assumptions, and interpreting the statistical output. We will outline the
commands used in least squares estimation using the program Stata. Extensive manuals and
abbreviated information are also available describing additional procedures and options are
available for Stata and other programs such as SAS, EVIEWS, Gretl, R, SHAZAM and
TSP. Gretl is quite user friendly and it is free.
Stata
The data files can be created with Microsoft Excel (saving the file as a csv file). Stata
will automatically read in any column headings the data have. With a file named
FUN388.CSV, we can easily perform least squares estimation of the relationship
ts Y FEt±
1 2 tC.I. for X
(inner intervals)
β + β
tC.I. for Y
(outer intervals)
II 29
Yt = β1 + β2Xt + εt
using the commands: . insheet using "C:\FUN388.CSV”, clear This reads the data into STATA.
This can also be done by opening the data editor and manually pasting the data.
. sum Y X Gives statistical characteristics of Y and
X. . plot Y X Plots Y on vertical axis, X on the
horizontal axis . reg Y X Uses OLS to estimate the given model To view additional residual diagnostics, use the following commands: After the “. reg Y X” command, type . predict error, resid (the variable “error” now contains the estimated
residuals)
1. To test for normality of the errors, type . sktest error Tests for normality using a skewness/kurtosis test. OR . swilk error Tests for normality using a Shapiro-Wilk test OR . sfrancia error Tests for normality using a Shapiro-Francia test. OR . qnorm error Displays plot of error against quantiles of normal
distribution. OR . findit jb The “findit” command is useful in Stata to find
commands that are not yet installed. “findit jb” will find the command for a Jarque-Bera test for normality. After installing the command, type “jb error” to run a Jarque-Bera test.
2. To test for heteroskedasticity, the following post-regression commands are useful:
. whitetst tests for heteroskedasticity using White’s test.
II 30
. estat hettest varnames tests for heteroskedasticity using a Breusch-Pagan and Cook and Weisberg test.
. estat hettest, rhs iid or fstat uses all rhs var’s and a chi squre or f-test . estat imest, preservewhite tests for heteroskedasticity (using White’s test)
and for skewness and kurtosis. More post-esimation commands are explained in the STATA help file titled
“regress postestimation.”
3. To test for autocorrelation (serial independence or randomness) of the error terms you must first declare your data to be time series with the command
. tsset timevar timevar is the name of the time variable in your dataset.
You can then test for autocorrelation in your time series data with the commands
. estat dwatson tests for first-order autocorrelation. . estat bgodfrey Breusch-Godfrey test for higher-order serial
correlation. . estat archlm tests for ARCH effects in the residuals.
. runtest varname varname is the name of the variable being tested for random order. 4. Some other options:
a. To calculate the sum of absolute errors (SAE), type
. egen SAE = sum(abs(error)) “SAE” will appear as a constant column in the data editor.
b. To view information criteria, including the log-likelihood value and the Akaike and Schwarz Bayesian information criteria, type
. estat ic
c. To display the variance covariance matrix, type . estat vce
d. To display the correlation matrix, type . estat vce, corr
e. Help files – use the Help menu or type HELP KEYWORD
II 31
Sample Stata output corresponding to the Anscombe_A data set in problem 1.2 (#4)
. infile x y using "C:\anscombe_a.txt", clear
(11 observations read)
. list y x
+------------+
| y x |
|------------|
1. | 8.04 10 |
2. | 6.95 8 |
3. | 7.58 13 |
4. | 8.81 9 |
5. | 8.33 11 |
|------------|
6. | 9.96 14 |
7. | 7.24 6 |
8. | 4.26 4 |
9. | 10.84 12 |
10. | 4.82 7 |
|------------|
11. | 5.68 5 |
+------------+
. plot y x
10.84 +
| *
|
|
| *
|
|
| *
|
| *
y | *
| *
| *
| *
|
|
| *
|
|
| *
4.26 + *
+----------------------------------------------------------------+
4 x 14
II 32
. sum y x
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
y | 11 7.500909 2.031568 4.26 10.84
x | 11 9 3.316625 4 14
. reg y x
Source | SS df MS Number of obs = 11
-------------+------------------------------ F( 1, 9) = 17.99
Model | 27.5100011 1 27.5100011 Prob > F = 0.0022
Residual | 13.7626904 9 1.52918783 R-squared = 0.6665
-------------+------------------------------ Adj R-squared = 0.6295
Total | 41.2726916 10 4.12726916 Root MSE = 1.2366
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .5000909 .1179055 4.24 0.002 .2333701 .7668117
_cons | 3.000091 1.124747 2.67 0.026 .4557369 5.544445
------------------------------------------------------------------------------
. whitetst
White's general test statistic : .6998421 Chi-sq( 2) P-value = .7047
. estat hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of y
chi2(1) = 0.41
Prob > chi2 = 0.5232
. estat ic
------------------------------------------------------------------------------
Model | Obs ll(null) ll(model) df AIC BIC
-------------+----------------------------------------------------------------
. | 11 -22.88101 -16.84069 2 37.68137 38.47717
------------------------------------------------------------------------------
*ll(model) corresponds to the optimized log-likelihood value to the specified model; whereas, ll(null) is obtained by estimating the model without any explanatory variables. Twice the difference of the log-likelihood values is distributed as a chi square with df equal to the number of explanatory variables.
II 33
H. FUNCTIONAL FORMS
In many applications the relationships between variables are not linear. A simple test for the presence of nonlinear relationships is the Regression Specification Error Test (RESET–Ramsey, 1969). This test can be performed as follows:
Ho: t t ty X β ε= + (estimate a linear model)
Ha: 2 3
1 2ˆ ˆ
t t t t ty X y yβ δ δ ε= + + (the y ’s denote OLS predicted values)
An F test of the hypothesis that both delta coefficients are simultaneously equal to zero is approximately distributed as an F(2, N-K). Alternatively nonlinear functions of x can be added to the linear terms and test for the collective explanatory power of the non-linear terms. Box-Cox transformations provide another approach.
The linear regression model just considered is more general than might first appear.
Many nonlinear models can be transformed so that "linear techniques" can be used.
We can consider two types of nonlinear models:
o transformable types--estimable by least squares
o nontransformable--use nonlinear optimization algorithms
1. Transformable Models
a. Log-Log or Double Log Model
t t tY AXβ= ε
The slope and elasticity are given by:
-1dY =
dXA X ββ
Y X
dY X = =
dX Yβη •
•
0β =
β =
0 1< β <
β =
1β >
β =
0β <
β =
II 34
This model can be estimated using least squares by taking the logarithm of the model to yield
ln Yt = ln A + β ln Xt + ln εt
= β1 + β2 lnXt + ln εt
where β1 = lnA and β2 = β . Regressing ln(Yt) on ln(Xt) gives estimates for β1 and
β2; hence 1A eβ=ˆˆ and 2
ˆ ˆβ = β .
b. Semi Log Models
(1)tX
tY = A Bt
ε
The slope and elasticities are given by
B;ln Y = dX
dYt
Y X = X ln Bη •
Estimation: Least squares can again be applied to the logarithmic transformation of the original model.
ln Yt = ln A + (ln B)Xt + ln εt = β1 + β2 Xt + ln εt.
Hence 1A eβ=ˆˆ and 2B eβ=
ˆˆ .
(2) tY
tX = A Bt
ε
The slope and elasticity are given byBln X
1 =
dX
dY
and ηY⋅X = 1/(Y ln B).
0 < B < 1
B = 1
B > 1
II 35
Estimation: Applying least squares to
ε
ttt ln
Bln
1 - Xln B)/ln (1 + BA/ln ln - = Y
= β1 + β2 ln Xt + ηt which yields
B = e1/ β 2 and A = e
- β 1/ β 2. c. Reciprocal Transformations
Yt = A + B/Xt + εt
The slope and elasticity are:
B/YX- = ;XB/- = dX
dYXY
2ε •
and
.B/XY- =
XYη •
β > 0
β < 0
II 36
Estimation: Let Z = 1/X, then estimate Yt = A + BZt + εt
= β1 + β2Z + εt and 1A = βˆ ˆ and 2B = βˆ .
d. Logarithmic Reciprocal Transformations
Yt = eA-B/X+εt
B/X = ;XBY/- = dX
dYXY
2 η •
Estimation: This model can be estimated using least squares on
ln Yt = A - B/Xt + εt
= β1 + β2(1/X) + εt where A = α = β 1 and
B = - β 2.
Application:
α = 0 market share
asymptotic level
II 37
e. Polynomials
y = β1 + β2x + β3x2 + β4x
3
β3 = β4 = 0 β4 = 0 β4 ≠ 0
Cost Function:
TC(q) = β1 + β2q + β3q2 + β4q
3
MC(q) = β2 + 2β3q + 3β4q2
• the desired shape requires β4 > 0
• a minimum for positive q requires
MC'(q) = 2β3 + 6β4q = 0 q = -2β3/6β4 > 0 β3 < 0
• minimum MC > 0 requires
4 2
3β - 4β2 3β4 < 0 2
3β < 3β2 β4
β2 > 0
Restrictions (Summary):
β1 ≥ 0, β2 > 0, β3 < 0, β4 > 0
2
3β < 3β2β4
II 38
f. Production Functions
(1) Cobb Douglas (CD) β ββ β ⋅ ε3 41 2+ t
t t t t = eY L K
ln Yt = β1 + β2t + β3 ln Lt + β4 ln Kt + ln εt
Production Characteristics:
β3 + β4 = 1 constant returns to scale
β3 = percent of total revenue paid to labor
= 1 = W/W%
(K/L)% =
KL∆
∆σ elasticity of substitution
(2) Translog Transformation
ln Yt = β1 + β2 ln Lt + β3 ln Kt
+ β3(ln Lt)2 + β4(ln Kt)
2 + β5(ln Lt)(ln Kt)
Note that this model includes the Cobb Douglas as a special case (β3=β4=β5=0).
(3) Constant Elasticity of Substitution (CES)
[ ] εδδ ρρ ρββttt
M/t+t K) - (1 + Le = Y 21
, - 1
1 =
ρσ
M = returns to scale. Cost functions can be estimated from estimated production functions.
Estimation: (?)
ln Yt = β1+β2t + M/ρ ln[δLρ+(1 - δ)kρ] + ln εt
This function is a "nontransformable" type.
2. "Nontransformable" Models
Problem: Estimate the parameters in
Yt = F(β1, β2, . . ., βs
; Xlt
, . . ., XKt
) + εt.
II 39
Two possible approaches include using nonlinear optimization programs or approximations.
(a) Nonlinear Optimization Approach
(1) Define the objective function
Min SSE or Maximum Likelihood
(2) Specify an initial guess for parameters. (3) "Press go."
Start at initial value and iterate to a solution.
(b) Examples:
(1) Logistic Model
[ ]εγβ
αδγ
tX+t
e + + = Y
t
Estimation:
δγ
β
α∑ X - - -
Yln = SSE t
t
2
= Σ(ln εt)2
(2) Constant elasticity of substitution (CES) production function
(3) Box Cox. Defineλ
λλ 1 - Y
= Y)(
Consider Y(λ)
= β1 + β2X(λ)
+ εt.
λ = 0: ln y = β1 + β2 ln X + εt
λ = 1: Y - 1 = β1 + β2(X - 1) + ε
or Y = 1 + β1 - β2 + β2X + ε Stata will estimate "Box-Cox" models with the command format
boxcox depvar [indepvars] [, options]
Options (list from help file “boxcox” in Stata).
model(lhsonly) applies the Box-Cox transform to depvar only.
model(lhsonly) is the default.
model(rhsonly) applies the transform to the indepvars only.
II 40
model(lambda) applies the transform to both depvar and indepvars, and they are transformed by the same parameter.
model(theta) applies the transform to both depvar and indepvars, but this
time, each side is transformed by a separate parameter.
notrans(varlist) specifies that the variables in varlist be included as nontransformed independent variables.
II 41
I. PROBLEM SETS
Problem Set 2.1
Simple Linear Regression
Theory
1. Let kids denote the number of children ever born to a woman, and let educ denote the years of
education for the woman. A simple model relating fertility to years of education is
kids = β0 + β1educ + u where u is the unobserved error.
a. All of the factors besides a woman’s education that affect fertility are lumped into the error term, u. What kinds of factors are contained in u? Which of these are likely to be correlated with level of education, which are not?
b. Will a simple regression analysis uncover the ceteris paribus effect of education on fertility? Explain.
(Wooldridge 2.1)
2. Demonstrate that
t t
22t
( - X)( - Y) Covariance( , )X Yˆ = ( )( - X)X
X Y
Variance Xβ
∑=
∑is equivalent to
a. )Xn( - X
YXn - YX22
t
tt
∑
∑
b. )X - X(
Y)X - X(2
t
tt
∑
∑
(Hints: Expand the numerator and denominator and remember that tX nX=∑ ).
c. If you only have two observations (n=2), ( ) ( )1 1 2 2( , , , )X Y X Y , demonstrate that the
equation for 2β can be simplified to 2 1
2 1
Y Yrise
run X X
−=
−.
(JM II-B, JM Math)
3. Demonstrate that the sample regression line obtained from least squares with an estimated
intercept passes through ( X , Y ). (Hint: 1 2ˆ ˆY Xβ β= + , substitute X X= , and simplify)
(JM II-B)
II 42
4. Consider the model Y
t = βX
t + ε
t, where
A.1 εt distributed normally
A.2 E(εt) = 0 ∀t
A.3 Var(εt) = σ2 ∀t
A.4 Cov(εt,ε
s) = 0 ∀t, s (t≠s)
A.5 Xt nonstochastic.
a) Find the least squares estimator of β. Hint: SSE = Σεt
2 = Σ(Yt - βXt)2.
b) Find the MLE of β and σ2.
Hint: l (Y; Β 2,β σ ) = Σ ln f(Yt; 2,β σ )
= )ln(2
n - )ln(2
2
n - 2/)X - Y(- 222
tt σΠ
σβ∑
c) Will the sample regression line ( )ˆˆt
Y Xβ= obtained in (a) or (b) pass through ( X , Y )?
Explain. (JM II-B)
Applied
5. The data set in CEOSAL2.RAW contains information on chief executive officers for U.S.
corporations. The variable salary is annual compensation, in thousands of dollars, and ceoten
is prior number of years as company CEO. i) Find the average salary and average tenure in the sample. ii) How many CEO’s are in their first year as CEO (that is, ceoten = 0)? iii) Estimate the simple regression model
log(salary) = β0 + β1ceoten + ε
and report your results in the usual form*. What is the predicted percentage increase in salary given one more year as CEO?
(Wooldridge C.2.2)
*The usual form is to write out the equation with the estimated betas and their standard errors underneath in parentheses. For example, if I was estimating
Yt = α + βX
t + ε
t and estimated α to be .543 with a standard error of .001 and β to be 1.43 with a standard error of 1.01 then I would report my results in the “usual form” as follows: Yt = .543 + 1.43*Xt R2 =.955
(.001) (1.01) N = 123. ** We will review the required Stata commands in class/TA sessions.
II 43
Problem Set 2.2
Simple Linear Regression
Theory
Consider the model
Yt = β
1 + β
2X
t + ε
t.
1. BACKGROUND: The purpose of this problem is to show that, using OLS, the total sum of
squares can be partitioned into two parts as follows:
)Y - Y + Y - Y( = )Y - Y( 2
ttt
n
1=t
2t
n
1=t
∑∑
)Y - Y( + )Y - Y)(Y - Y( 2 + )Y - Y( =2
t
n
1=t
ttt
n
1=t
2t
n
1=t
∑∑∑
where the terms )Y - Y( ,)Y - Y( ,)Y - Y( 2
t
n
1=t
2t
n
1=t
2t
n
1=t
∑∑∑ are referred to as the total sum of
squares (SST), sum of squares error (SSE), sum of squares "explained by the regression" (SSR), respectively. This notation differs from that used by Wooldridge, but conforms with notation used in a number of other econometrics texts
QUESTION: Explain why the cross product term
n n n
t tt tt t t 1 2t=1 t=1 t=1
ˆ ˆˆ ˆ ˆ( - )( - Y) = ( - Y) = ( + - Y) = 0e eY XY Y Y β β∑ ∑ ∑
when least squares estimators are used. (Remember the first order conditions or normal equations.)
(JM II-B)
Applied
2. For the population of firms in the chemical industry, let rd denote annual expenditures on
research and development, and let sales denote annual sales (both are in millions of dollars).
a. Write down a model (not an estimated equation) that implies a constant elasticity between rd and sales. Which parameter is the elasticity? (Hint: what functional form should be used?)
b. Now estimate the model using the data in RDCHEM.RAW. Write out the estimated equation in the usual form*. What is the estimated elasticity of rd with respect to sales? Explain in words what this elasticity means.
(Wooldridge C 2.5)
* report the estimated parameters, standard errors, and R2
II 44
3. Consider the following four sets of data1
Data Set A B C D
Variable X Y X Y X Y X Y
Obs. No. 1 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
2 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
3 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
4 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
5 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
6 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
7 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
8 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
9 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
10 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
11 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
a. For each of the data sets estimate the relationship
Yt = β
1 + β
2X
t +
tε
, using least squares.
b. Compare and explain the four sets of results. (Hint: plot the data.) c. In each of the four cases obtain a prediction of the value of Y
t corresponding to a value of X = 20.
Which of the forecasts would you feel most comfortable with? Explain. d. Based upon these examples comment on the following widely held notions.
i) "Numerical calculations are exact, but graphs are rough."
ii) "For any particular kind of statistical data there is just one set of calculations constituting a
correct statistical analysis."
iii) "Performing intricate calculations is rigorous, whereas actually looking at the data is cheating." (JM II)
1 Reference: Anscombe, F. J., "Graphs in Statistical Analysis," The American Statistician, Vol. 27 (1973), p. 17-21.
II 45
4. The following Stata printout corresponds to the first Anscombe data set.
a. From the printout, determine the values of the following:
X = 2
s =
ˆ2
2s
β=
b. Calculate the predicted value of Y and the variance of the forecast error
corresponding to x=20.
(1) Y =
(2) 2 2
Ys s+ =
(3) 2
Ys =
Hint: Recall that 2
22 2 2ˆ ˆ(20 )
Y
ss X s
n β
= + −
and 2
FEs = 2 2
Ys s+
c. Calculate 95% confidence intervals for the actual value of Y corresponding to X=20.
d. Calculate 95% confidence intervals for the population regression line corresponding
to X=20. Yet another hint: the sample and population regression lines, respectively,
are defined by ( )1 2ˆ ˆˆ t tY Xβ β+ and 1 2 t
Xβ β+ , so use Ys for part (d) and FE
s for part
(c).
Check your work: Recall that the confidence interval for the population regression line is narrower than the confidence interval for the actual value of Y corresponding to a given X.
5. Consider the attached data file (functional forms 2.dta). X denotes the independent variable, x=1,2,3, ..., 100. Corresponding to this independent variable, various dependent variables were generated. Plot and estimate an appropriate functional form between
a. the dependent variable denoted loglog and x; b. semilog1 and x; c. reciptrans and x; d. polya and x; e. polyb and x; and f. polyc and x.
II 46
STATA Output (for problem #4)
. infile x y using "anscombe_a.txt", clear
(11 observations read)
. summ y x
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
y | 11 7.500909 2.031568 4.26 10.84
x | 11 9 3.316625 4 14
. reg y x
Source | SS df MS Number of obs = 11
-------------+------------------------------ F( 1, 9) = 17.99
Model | 27.5100011 1 27.5100011 Prob > F = 0.0022
Residual | 13.7626904 9 1.52918783 R-squared = 0.6665
-------------+------------------------------ Adj R-squared = 0.6295
Total | 41.2726916 10 4.12726916 Root MSE = 1.2366
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .5000909 .1179055 4.24 0.002 .2333701 .7668117
_cons | 3.000091 1.124747 2.67 0.026 .4557369 5.544445
------------------------------------------------------------------------------
. set obs 12
obs was 11, now 12
. replace x=20 in 12
(1 real change made)
. predict yhat
(option xb assumed; fitted values)
. predict sfe, stdf
. list in 11/12 //only lists observations 11 and 12
+---------------------------------+
| x y yhat sfe |
|---------------------------------|
11. | 5 5.68 5.500546 1.375003 |
12. | 20 . 13.00191 1.830386 |
1II'
James B. McDonald Brigham Young University 1/7/2010
III. Classical Normal Linear Regression Model Extended to the Case of k
Explanatory Variables
A. Basic Concepts
Let y denote an n x l vector of random variables, i.e., y = (y1, y2, . . ., yn)'.
1. The expected value of y is defined by
M
1
2
n
E( )y
E( )yE(y) =
E( )y
2. The variance of the vector y is defined by
K
K
M M O M
1 1 2 1 n
2 1 2 2 n
n 1 n 2 n
Var( ) Cov( , ) Cov( , )y y y y y
Cov( , ) Var( ) Cov( , )y y y y yVar(y) =
Cov( , ) Cov( , ) Var( )y y y y y
NOTE: Let µ = E(y), then
Var(y) = E[(y - µ)(y - µ)']
µ µ
M
11
nn
-y
= E
-y
(y1 - µ1, . . ., yn - µn)
µ µ µ µµ
µ µ µ µµ
21 2 1 n1 2 1 n11
22 1 2 n2 1 2 n22
n
E( - )( - ) . . . E( - )( - )y y y yE( -y )
E( - )( - ) . . . E( - )( - )y y y yE( -y )
. . .=
. . .
. . .
E( y
µ µ µ µ µ
2n 1 n 21 n 2 nn
- )( - ) E( - )( - ) . . .y y y E( - y )
2II'
1 1 2 1 n
2 1 2 2 n
n 1 n 2 n
Var( ) Cov( , ) ... Cov( , )y y y y y
Cov( , ) Var( ) ... Cov( , )y y y y y
. . .= .
. . .
. . .
Cov( , ) Cov( , ) ... Var( )y y y y y
3. The n x l vector of random variables, y, is said to be distributed as a multivariate
normal with mean vector µ and variance covariance matrix ΣΣΣΣ (denoted y ~
N(µ,ΣΣΣΣ)) if the probability density function of y is given by
-11
- (y- ) (y- )2
n 1
2 2
ef(y; , ) = .
(2 |) |
′µ µ∑
µ ∑π ∑
Special case (n = 1): y = (y1), µ = (µ1), Σ = (σ2).
)()(2
e = ),;yf(
2
12
2
1
)-y(1
)-y(2
1-
11
11211
σπσµ
µσ
µ
. 2
e =
2
2
)-y-(2
211
σπ
σ
µ
4. Some Useful Theorems
a. If y ~ N(µy,Σy), then z = Ay ~ N(µz = Aµy; Σz = AΣyA') where A is a
matrix of constants.
b. If y ~ N(0,I) and A is a symmetric idempotent matrix, then y'Ay ~ χ2(m)
where m = Rank(A) = trace (A).
c. If y ~ N(0,I) and L is a k x n matrix of rank k, then Ly and y'Ay are
independently distributed if LA = 0.
d. If y ~ N(0,I), then the idempotent quadratic forms y'Ay and y'By are
independently distributed χ2 variables if AB = 0.
3II'
NOTE:
(1) Proof of (a)
(2) Example: Let y1, . . ., yn denote a random sample drawn from
N(µ,σ2).
The "Useful" Theorem 4.a implies that:
21 n
1 1 1 1y = + ... + = , . . . y ~ N( , /n)y y
n n n n
µ σ
.
Verify that
(a) µ
µ
µ
= n
1,...,
n
1
M
(b) ./n =
n
1
n
1
I n
1,...,
n
1 22σσ
M
E(z) = E(Ay) = AE(y) = Aµy
VAR(z) = E[(z - E(z))(z - E(z))']
= E[(Ay - Aµy)(Ay - Aµy)']
= E[A(y - µy)(y - µy)'A']
= AE[(y - µy)(y - µy)']A'
= AΣyA' =Σ z
σ
σ
µ
µ
=2
21
. . . 0
. .
0 . . .
,.
.N~
.
.y O
ny
y
4II'
B. The Basic Model
Consider the model defined by
(1) yt = β1xtl + β2xt2 + . . . + βkxtk + εt (t = 1, . . ., n).
If we want to include an intercept, define xtl = 1 for all t and we obtain
(2) yt = β1 + β2xt2 + . . . + βkxtk + εt.
Note that βi can be interpreted as the marginal impact of a unit increase in xi on the
expected value of y.
The error terms (εt) in (1) will be assumed to satisfy:
(A.1) εt distributed normally
(A.2) E(εt) = 0 for all t
(A.3) Var(εt) = σ2 for all t
(A.4) Cov(εtεs) = 0,t ≠ s.
Rewriting (1) for each t (t = 1, 2, . . ., n) we obtain
y1 = β1x11 + β2x12 + . . . + βkx1k + ε1
y2 = β1x21 + β2x22 + . . . + βkx2k + ε2
. . . .
. . . .
(3) . . . .
yn = β1xn1 + β2xn2 + . . . + βkxnk + εn.
The system of equations (3) is equivalent to the matrix representation
y = Xβ + ε
where the matrices y, X, β and ε are defined as follows:
5II'
y = Xβ + ε.
(A.1)' ε ~ N(0; Σ = σ2I)
(A.5)' The xtj's are nonstochastic and
x
n
X X = Limit n→∞
′ Σ
is nonsingular.
columns: n observations on k
individual variables.
rows: may represent
observations at a given point
in time.
11
22
nk
= and = .
β ε
β ε β ε β ε
M M
NOTE: (1) Assumptions (A.1)-(A.4) can be written much more
compactly as
(A.1)’ ε ~ N (0; Σ = σ2I).
(2) The model to be discussed can then be summarized as
11 1k1
21 2k2
n1 nkn (nxk)(nx1)
y x x
y x xy = X =
y x x
K
K
M M M
K
6II'
C. Estimation
We will derive the least squares, MLE, BLUE and instrumental variables estimators in
this section.
1. Least Squares:
The basic model can be written as
y = Xβ + ε
ˆ ˆ= Xβ + e = Y + e
where ˆY = Xβ is an nx1 vector of predicted values for the dependent variable and
e denotes a vector of residuals or estimated errors.
The sum of squared errors is defined by
n2t
t=1
ˆSSE(β) = e∑
e
e
e
)e , ,e ,e( =
n
2
1
n21
M
K
ee = ′
ˆ ˆ= (y - Xβ) (y - Xβ)′
ˆ ˆ ˆ ˆ= y y - β X y - y Xβ + β X Xβ′ ′ ′ ′ ′ ′
ˆ ˆ ˆ= y y - 2β X y + β X Xβ .′ ′ ′ ′ ′
The least squares estimator of β is defined as the β which minimizes ˆSSE (β). A
necessary condition for ˆSSE(β) to be a minimum is that
ˆdSSE(β) = 0
ˆdβ (see Appendix A for how to differentiate a real
valued function with respect to a vector) ˆdSSE(β) ˆ = -2X y + 2X Xβ = 0 or
ˆdβ′ ′
7II'
yX = ˆXX ′′ β
yX)XX( = ˆ -1 ′′β
Normal Equations
is the least squares estimator.
Note that β is a vector of least squares estimators of β1, β2,...,βk.
2. Maximum Likelihood Estimation (MLE)
Likelihood Function: (Recall y ~ N (Xβ; Σ = σ2I)) -11- (y-X ) (y-X )
22
1n/ 22
eL(y; , = I) =
(2 |) |
′β β∑
µ ∑ σπ ∑
2
1- (y-X ) (y-X )2
1n/ 2 2 2
e =
(2 | I) |
′β βσ
π σ
2(y-X ) (y-X ) / 2
nn 2 2 2
e = .
(2 () )
′β β σ
π σ
The natural log of the likelihood function,
σπσ
β′β 2
2ln
2
n - 2ln
2
n -
2
)X(y-)X(y- - = Lln = l
is known as the log likelihood function. l is a function of β and σ2.
The MLE. of β and σ are defined by the two equations (necessary conditions for a
maximum):
2
1 = (-2X y + 2(X X) ) = 0β 2
∆
∆
∂′ ′ β
∂ σ
l
2222
(y - X ) (y - X ) n 1 = - = 0
22( )
∆ ∆
∆∆
′∂ β β ∂ σ
σ σ
l
i.e.,
-1 = (X X X'y)
∆
′β
8II'
+π+=
nln)2ln(1
2
n-
SSEl
.
NOTE: (1) ˆ = ∆
β β
(2) 2∆
σ is a biased estimator of σ2; whereas,
2 1 (y - X ) (y - X ) SSE = e e = = s
n- k n - k n - k
∆ ∆
′β β′
is an unbiased estimator of σ2.
A proof of the unbiasedness of s2 is given in Appendix B.
Only n-k of the estimated residuals are independent. The
necessary conditions for least squares estimates impose k
restrictions on the estimated residuals (e). The restrictions
are summarized by the normal equations X'X β = X'y, or
equivalently
(3) Substituting σ2 = SSE/n into the log likelihood function
yields what is known as the concentrated log likelihood
function
which expresses the loglikelihood value as a function of β
only. This equation also clearly demonstrates the
equivalence of maximizing l and minimizing SSE.
X’e = 0
2t
12 = (y - X ) (y - X )n
e e e= =
n n
∆ ∆ ∆
′β β
′ ∑
σ
9II'
3. BLUE ESTIMATORS OF β, β .%
We will demonstrate that assumptions (A.2)-(A.5) imply that the best
(least variance) linear unbiased estimator (BLUE) of β is the least squares
estimator. We first consider the desired properties and then derive the associated
estimator.
Linear: Ay = ~β where A is a kxn matrix of constants
Unbiased: ββ AX = AE(y) = )~
E(
We note that βββ = XA = )~
E( requires AX = I.
Minimum Variance:
i iiVar( ) = Var(y) β A A′%
= σ2AiAi'
where Ai = the ith row of A and ii = yβ A% .
Thus, the construction of BLUE is equivalent to selecting the matrix A so that the
rows of A
Min AiAi' i = 1, 2, . . ., k
s.t. AX = I
or equivalently, min i
Var( )β% s.t. AX = I (unbiased).
The solution to this problem is given by
A = (X'X)-1X' ; hence, the BLUE of β is given by -1
= Ay (X X X y)′ ′β =% .
The details of this derivation are contained in Appendix C.
NOTE: (1)
(2) ( )1
AX X 'X X 'X I−
= = ; thus β% is unbiased
-1ˆβ = β = β = (X X X y)∆
′ ′%
10II'
4. Instrumental Variables Estimators
y = Xβ + ε
Let Z denote an n x k matrix of “instruments” or "instrumental" variables.
Consider the solution of the modified normal equations:
ZZ ' Y Z ' X ;= β% hence, ( )
1
zβ Z X Z y−
′ ′= .
zβ is referred to as the instrumental variables estimator of β based on the
instrumental variables Z. Instrumental variables can be very useful if the
variables on the right hand side include “endogenous” variables or in the case of
measurement error. In this case OLS will yield biased and inconsistent
estimators; whereas, instrumental variables can yield consistent estimators.
NOTE: (1) The motivation for the selection of the instruments (Z) is
that the covariance (Z,ε) approaches 0 and Z and X are
correlated. Thus Z'(Y) = Z'(Xβ + ε) = Z' X β + Z'ε≈ Z' Xβ.
(2) Ifn
Z XLim
n→∞
′
is nonsingular andn
Z = 0Lim
n→∞
′ε
, then
zβ is a consistent estimator of β.
(3) Many calculate an R2 after instrumental variables
estimation using the formula R2 = 1 – SSE/SST. Since this
can be negative, there is not a natural interpretation of R2
for instrumental variables estimators. Further, the R2 can’t
be used to construct F-statistics for IV estimators.
(4) If Z includes “weak” instruments (weakly correlated
with the X’s), then the variances of the IV estimator can
be large and the corresponding asymptotic biases can be
large if the Z and error are correlated. This can be
seen by noting that the bias of the instrumental variables
estimator is given by
E ( )1
' / ( ' / )Z X n Z nε−
.
(5) As a special case, if Z = X, then ∆
ˆˆ ˆ = = β = β = ββ βz x% .
11II'
(6) If Z is an x k* n matrix where k< k* (Z contains more
variables than X), then the IV estimator defined above must
be modified. The most common approach in this case is to
replace Z in the “IV” equation by the projections** of X on
the columns of Z, i.e. ( )1ˆ ' 'X Z Z Z Z X
−= .
This substitution yields the IV estimator
( ) ( )
1
11 1
ˆ ˆ' '
' ' ' ' ' '
IVX X X Y
X Z Z Z Z X X Z Z Z Z Y
β−
−− −
=
=
which yields estimates for k k* ≤ .
.
The Stata command for the instrumental variables estimator
is given by
ivregress 2sls depvar (varlist_1 =varlist_iv)
[varlist_2]
where estimator = 2sls, gmm, or liml with
2sls is the default estimator
for the model
1 2depvar = (varlist_1)b + var(list_2)b + error
where varlist_iv are the instrumental variables for varlist_1.
A specific example is given by:
ivregres 2sls y1 (y2=z1 z2 z3) x1 x2 x3
Identical results could be obtained with the command,
Ivregress 2sls y1 (y2 x1 x2 x3=z1 z2 z3)
which is equivalent to regressing all of the right hand side
variables on the set of instrumental variables. This can be
thought of as being of the form
ivregress 2sls y (X=Z)
**The projections of X on Z can be obtained by obtaining
estimates of
in the "reduced form" equation X Z VΠ = Π + to yield
( )1ˆ ' 'Z Z Z X
−Π = ; hence, the estimate of X is given by
( )1ˆ ˆ ' 'X Z Z Z Z Z X
−= Π =
12II'
D. Distribution of ∆
β, , ββ %
Recall that under the assumptions (A.1) – (A.5) y ~ N(Xβ, Σ = σ2I) and
-1β = β = β = (X X X y;)
∆
′ ′%
hence, by useful theorem (II.’ A. 4.a), we conclude that ∆
2yy
β = β = β ~ N(A A A ) = N[Ax , A IA ]′ ′βµ ∑ σ%
where A = (X'X)-1X'.
The desired derivations can be can be simplified by noting that
AXβ = (X'X)-1X'Xβ = β
σ2AA' = σ2(X'X)-1X'((X'X)-1X')'
= σ2(X'X)-1X'X((X'X)-1)'
= σ2((X'X)-1)'
= σ2((X'X)')-1
= σ2(X'X)-1.
Therefore ( )( )∆
12β = β = β ~ N β; X X−
′σ%
NOTE: (1) σ2(X'X)
-1 can be shown to be the Cramer-Rao matrix, the matrix
of lower bounds for the variances of unbiased estimators.
(2) ∆
β, , β,β % are
⋅unbiased
⋅consistent
.minimum variance of all (linear and nonlinear unbiased
estimators
⋅normally distributed
13II'
(3) An unbiased estimator of σ2(X'X)-1 is given by
s2(X'X)-1
where s2 = e'e/(n-k) and is the formula used to calculate the
"estimated variance covariance matrix" in many computer
programs.
(4) To report s2(X'X)-1 in STATA type
. reg y x
. estat vce
(5) Distribution of the variance estimator
χσ
22
2
(n - k)s ~ (n - k)
NOTE: This can be proven using the theorem (II'.A.4(b)) and noting that 2 ˆ ˆ(n- k) = e e = (Y - Xβ) (Y - Xβ) .s ′ ′
-1
= (X + ) (I - X(X X X )(X + ))′ ′ ′β ε β ε
= ε'(I - X(X'X)-1X')ε.
Therefore,2
-1
2
(n- k)s = (I - X(X X X ))
′ε ε ′ ′
σ σ σ
= M ′ε ε
σ σ
where ~ N [0, I].ε
σ
hence
22
2
(n- k)s ~ (n- k) becauseχ
σ
M is idempotent with rank and trace equal to n - k.
14II'
E. Statistical Inference
1. Ho: β2 = β3 = . . . = βk = 0
This hypothesis tests for the statistical significance of overall explanatory power
of the explanatory variables by comparing the model with all variables included to
the model without any of the explanatory variables, i.e., yt = β1 + εt (all non-
intercept coefficients = 0). Recall that the total sum of squares (SST) can be
partitioned as follows:
)y - y( + )y - y( = )y - y( 2
t
N
1=t
2
tt
N
1=t
2
t
N
1=t
∑∑∑ or
SST = SSE + SSR.
Dividing both sides of the equation by σ2 yields quadratic forms, each having a
chi-square distribution:
2 2 2
SST SSE SSR = +
σ σ σ
χ2(n - 1) = χ2(n - k) + χ2(k - 1).
This result provides the basis for using
to test the hypothesis that β2 = β3 = . . . = βk = 0.
NOTE: (1)R - 1
R =
SST
SSR - 1
SSR/SST =
SSR - SST
SSR =
SSE
SSR2
2
hence, the F-statistic for this hypothesis can also be rewritten as
Recall that this decomposition of SST can be summarized in an ANOVA table as
2
2
SSR(K -1)(n- K)K 1F = = ~ F(K - 1, n - K)
SSE (n- K)(K -1)
n K
χ−χ
−
2
2
2 2
Rn - k Rk - 1F = = ~ F(k - 1,n - k).
(1 - ) /(n - k) k - 1 1 - R R
15II'
follows:
Source of Variation
SS
d.f
MSE
Model
Error
SSR
SSE
K - 1
n – K
SSR/(K-1)
SSE/(n - K) 2s=
Total
SST
n – 1
K = number of coefficients in model
where the ratio of the model and error MSE’s yields the F statistic just discussed.
Additionally, remember that the adjusted R2 ( 2R ), defined by
22 t
2t
( ) /(n- K)e = 1 - ,R
( - Y /(n - 1))Y
∑
∑
will only increase with the addition of a new variable if the t-statistic associated with
the new variable is greater than 1 in absolute value. This result follows from the
equation
( )( )
_ var
22
_ var2 2
ˆ
ˆ 0( 1)1
1New
NewNewNew Old
n SSER R
n k n K SST sβ
β −− − = −
− − −
where the last
term in the product is ( )2 1t − and K denotes the number of coefficients in the “old”
regression model and the “new” regression model includes K+1 coefficients.
The Lagrangian Multiplier (LM) test can also be used to test this hypothesis
2 2~ ( 1)aLM NR kχ= −
16II'
2. Testing hypotheses involving individual βi's
Recall that
-1
β ~ N (β; σ (X X ))′
where
( )
2ˆ ˆ ˆ ˆ ˆβ β β β β1 1 2 1 k
2ˆ ˆ ˆ ˆ ˆβ β β β β2 1 2 2 k12
2ˆ ˆ ˆ ˆ ˆβ β β β βk 1 k 2 k
X X−
′σ =
σ σ σσ σ σ
σ σ σ
L
M O
which can be estimated by
( )
2ˆ ˆ ˆ ˆ ˆβ β β β β1 1 2 1 k
2ˆ ˆ ˆ ˆ ˆβ β β β β2 1 2 2 k12
2ˆ ˆ ˆ ˆ ˆβ β β β βk 1 k 2 k
s s s
s s ss X X
s s s
−
′ =
L
M O
Hypotheses of the form H0: βi = 0iβ can be tested using the result
The validity of this distributional result follows from
2
N(0,1) ~ t(d)
(d) /dχ
since
i
ii
β
ˆ - ββ ~ N(0,1) and
σ
i
i
22
β2
β
(n - k) ~ (n - k).χs
σ
i
0
ii
β
ˆ - ββ ~ t(n - k)
s
17II'
3. Tests of hypotheses involving linear combinations of coefficients
A linear combination of the βi's can be written as
1k
i 1 ki
=1
k
β
= ( ,..., ) = δ β.βδ δ δ
β
′
∑l
M
We now consider testing hypotheses of the form
Recall that
-12β ~ N (β; (X X ) ;)σ ′
therefore,
-12ˆδ β ~ N (δβ; δ (X X δ))σ′ ′ ′
hence, '
' ' '
-1 2' 2ˆδ β
ˆ ˆδβ - δβ δβ - γ = ~ t(n - k).
δ (X,X δ) ss
The t-test of a hypothesis involving a linear combination of the coefficients
involves running one regression and estimating the variance of ˆδ β′ from s2(X'X)-1
to construct the test statistics.
4. More general tests
a. Introduction
We have considered tests of the overall explanatory power of the
regression model (Ho: β2 = β3 = . . . βk = 0), tests involving individual parameters
(e.g., Ho: β3 = 6), and testing the validity of a linear constraint on the coefficients
H0: δ'β = γ.
18II'
(Ho: δ’β = γ). In this section we will consider how more general tests can be
performed. The testing procedures will be based on the Chow and Likelihood
ratio (LR) tests. The hypotheses may be of many different types and involve the
previous tests as special cases. Other examples might include joint hypotheses of
the form: Ho: β2 + 6 β5 = 4, β3 = β7 = 0. The basic idea is that if the hypothesis is
really valid, then goodness of fit measures such as SSE, R2 and log-likelihood
values (l) will not be significantly impacted by imposing the valid hypothesis in
estimation. Hence, the SSE, R2 or l values will not be significantly different for
constrained (via the hypothesis) and unconstrained estimation of the underlying
regression model. The tests of the validity of the hypothesis are based on
constructing test statistics, with known exact or asymptotic distributions, to
evaluate the statistical significance of changes in SSE, R2, or l .
Consider the model
y = X β + ε
and a hypothesis, Ho: g(β) = 0 which imposes individual and/or multiple
constraints on the β vector.
The Chow and likelihood ratio tests for testing Ho: g(β) = 0 can be
constructed from the output obtained from estimating the two following
regression models.
(1) Estimate the regression model y = Xβ + ε without imposing any
constraints on the vector β. Let the associated sum of square errors,
coefficient of determination, log-likelihood value and degrees of freedom
19II'
be denoted by SSE, R2, l , and (n - k).
(2) Estimate the same regression model where the β is constrained as
specified by the hypothesis (Ho: g(β) = 0) in the estimation process. Let
the associated sum of squared errors, R2, log-likelihood value and degrees
of freedom be denoted by SSE*, R2*, l * and (n - k)*, respectively.
b. Chow test
The Chow test is defined by the following statistic:
where r = (n-k) - (n-k)* is the number of independent restrictions imposed on β by
the hypothesis. For example, if the hypothesis was Ho: β2 + 6 β5 =4, β3 = β7 = 0,
then the numerator degrees of freedom (r) is equal to 3. In applications where the
SST is unaltered by the imposing the restrictions, we can divide the numerator and
denominator by SST to yield the Chow test rewritten in terms of the change in the
R2 between the constrained and unconstrained regressions.
Note that if the hypothesis (H0: g(β) = 0) is valid, then we would expect R2 (SSE)
and R2* (SSE*) to not be significantly different from each other. Thus, it is only
large values (greater than the critical value) of F which provide the basis for
rejecting the hypothesis. Again, the 2R form of the Chow test is only valid if the
dependent variable is the same in the constrained and unconstrained regression.
References:
(1) Chow, G. C., "Tests of Equality Between Subsets of Coefficients in Two
Linear Regressions," Econometrica, 28(1960), 591-605.
(2) Fisher, F. M., "Tests of Equality Between Sets of Coefficients in Two Linear
Regressions: An Expository NOTE," Econometrica, 38(1970), 361-66.
SSE* - SSE
rSSE ~ F(r, n - k)n - k
2 2
2
- * n - kR RF = ~ F(r, n - k)
1 - rR
20II'
c. Likelihood ratio (LR) test.
The LR test is a common method of statistical inference in classical
statistics. The motivation behind the LR test is similar to that of the Chow test
except that it is based on determining whether there has been a significant
reduction in the value of the log-likelihood value as a result of imposing the
hypothesized constraints on β in the estimation process. The LR test statistic is
defined to be twice the difference between the values of the constrained and
unconstrained log-likelihood values (2( l - l *)) and, under fairly general
regularity conditions, is asymptotically distributed as a chi-square with degrees of
freedom equal to the number of independent restrictions (r) imposed by the
hypothesis. This may be summarized as follows:
The LR test is more general than the Chow test and for the case of
independent and identically distributed normal errors, with known σ2, LR is equal
to LR = [SSE* - SSE]/σ2 .
Recall that s2 = SSE/(n - k) appears in the denominator of the Chow test statistic
and that for large values of (n-k), s2 is "close" to σ2; hence, we can see the
similarity of the LR and Chow tests. If σ2 is unknown, substituting the
concentrated log-likelihood function into LR yields
LR = 2 ( l - l *)
= n [ln (SSE*) - ln (SSE) ]
= n [ln (SSE* / SSE)].
2aLR = 2( - *) (r). χl l %
21II'
a LR = nln[1/(1-R2)] = -nln[1-R2] ~ χ2(k-1).
If the hypothesis Ho: β2 = β3 = . . . βk = 0 is being tested in the classical
normal linear regression model, then SSE* = SST and LR can be rewritten in
terms of the R2 as follows:
In this case, the Chow test is identical to the F test for overall explanatory power
discussed earlier.
Thus the Chow test and LR test are similar in structure and purpose. The
LR test is more general than the Chow test; however, its distribution is
asymptotically (not exact) chi-square even for non-normally distributed errors.
The LR test provides a unified method of testing hypotheses.
d. Applications of the Chow and LR tests:
(1) Model: yt = β1 + β2xt2 + β3xt3 + β4xt4 + εt
Ho: β2 = β3 = 0 (two independent constraints)
(a) Estimate yt = β1 + β2xt2 + β3xt3 + β4xt4 + εt
to obtain SSE = Σet2 = (n - 4)s2, R2 ,
l =
Π
n
SSEln + )ln(2 + 1
2
n- ,
n-k = n - 4
(b) Estimate yt = β1 + β4xt4 + εt to obtain
SSE* = Σet*2 = (n - 2)s*2
SSE*, R2*, l * and (n-k)* = n - 2
22II'
(c) Construct the test statistics
SSE* - SSE SSE* - SSEn- 4 SSE*-SSE(n k)* (n k) 2Chow = = =
SSE SSE 2 SSE
n k n 4
− − −
− −
2 2
2
- * n - 4R R= ~ F(2, n - 4)
1 - 2R
a LR = 2( l - l *) ~ χ2(2).
(2) Tests of equality of the regression coefficients in two different regressions
models.
(a) Consider the two regression models
y(1) = X(1) β(1) + ε(1) n1 observations, k independent variables
y(2) = X(2) β(2) + ε(2) n2 observations, k independent variables
Ho: β(1) = β(2) (k independent restrictions)
(b) Rewrite the model as
(1)'
(1) (1)(1) (1)
(2)(2) (2) (2)
0 y Xy = = +
0 y X
β ε β ε
Estimate (1)' using least squares and determine SSE, R2, l
and (n - k) = n1 + n2 - 2k.
Now impose the hypothesis that β(1) = β(2) = β and write (1)
as
(2)’
(1) (1) (1)
(2) (2) (2)
y Xy = = β +
y X
ε ε
Estimate (2)’ using least squares to obtain the constrained
sum of squared errors (SSE*), R2*, l * and
23II'
(n - k)* = n1 + n2 - k.
(c) Construct the test statistics
SSE* - SSE
(n - k)* - (n - k)Chow =
SSE
(n k)−
2 2
1 2, 1 22
- * + - kR R n n = ~ F ( + - 2k)k n n
1 - kR
a LR = 2( l - l *) ~ χ2 (k).
5. Testing Hypotheses using Stata a. Stata reports the log likelihood values when the command
estat ic
follows a regression command and can be used in constructing LR tests.
b. Stata can also perform many tests based on t or Chow-type tests.
Consider the model
(1) Yt = β1 + β2Xt2 + β3Xt3 + β4Xt4 + εt
with the hypotheses:
(2) H1: β2 = 1
H2: β3 = 0
H3: β3 + β4 = 1
H4: β3β4 = 1
H5: β2 = 1 and β3 = 0
The Stata commands to perform tests of these hypotheses follow OLS
estimation of the unconstrained model.
24II'
reg Y X2 X3 X4
estimates the unconstrained model
test X2 = 1 (Tests H1)
test X3 = 0 (Tests H2)
test X3 + X4 = 1 (Tests H3)
testnl _b[X3]*_b[X4] = 1 (Tests H4. The “testnl” command is
for testing nonlinear hypotheses. The suffix “_b”, along with the braces, must be used when testing nonlinear hypotheses)
test (X2 = 1) (X3 = 0) (Tests H5)
95% confidence intervals on coefficient estimates are automatically calculated in
Stata. To change the confidence level, use the “level” option as follows:
reg Y X2 X3 X4, level(90) (changes the confidence level
to 90%)
25II'
F. Stepwise Regression
Stepwise regression is a method for determining which variables might be
considered as being included in a regression model. It is a purely mechanical approach,
adding or removing variables in the model solely determined by their statistical
significance and not according to any theoretical reason. While stepwise regression can be
considered when deciding among many variables to include in a model, theoretical
considerations should be the primary factor for such a decision.
A stepwise regression may use forward selection or backward selection. Using
forward selection, a stepwise regression will add one independent variable at a time to see
if it is significant. If the variable is significant, it is kept in the model and another variable
is added. If the variable is not significant, or if a previously added variable becomes
insignificant, it is not included in the model. This process continues until no additional
variables are significant.
Stepwise regression using Stata
To perform a stepwise regression in Stata, use the following commands:
Forward:
stepwise, pe(#): reg dep_var indep_vars
stepwise, pe(#) lockin1: reg dep_var (forced in
variables) other indep_vars
Backward:
stepwise, pr(#): reg dep_var indep_vars
26II'
stepwise, pr(#) lockin1: reg dep_var (forced in
variables) other indep_vars
where the “#” in “pr(#)” is the significance level at which variables are removed, as
0.051, and the “#” in “pe(#)” is the significance level at which variables are entered or
added to the model. If pr(#1) and pr(#2) are both included in a stepwise regression
command, #1 must be greater than #2. Also, “depvar” represents the dependent variable,
“forced_indepvars” represent the independent variables which the user wishes to remain
in the model no matter what their significance level may be, and “other_indepvars”
represents the other independent variables which the stepwise regression will consider
including or excluding. Forward and backward stepwise regression may yield different
results.
G. Forecasting
Let yt = F(Xt, β) + εt
denote the stochastic relationship between the variable yt and the vector of variables Xt
where Xt = (xt1,..., xtk). β represents a vector of unknown parameters.
Forecasts are generally made by estimating the vector of parameters ˆβ(β) ,
determining the appropriate vector )X(X tt and then evaluating
ttˆˆˆ = F( , β) .y X
The forecast error is FE = yt - yt.
There are at least four factors which contribute to forecast error.
27II'
1. Incorrect functional form (This is an example of specification error and will be
discussed later.)
2. Existence of random disturbance (εt)
Even if the "appropriate" future value of Xt and true parameter values, β,
were known with certainty
FE = yt - yt = yt - F(Xt,β) = εt
2
FEσ = Variance(FE)
= Var(εt) = σ2.
In this case confidence intervals for yt would be obtained from
t t( / 2) ( / 2)tPr [F ( , β) - σ < < F ( , β) + σ] = 1 - αyt tX Xα α
which could be visualized as follows for the linear case:
Yt
X
Yt
Xt
28II'
3. Uncertainty about β
Assume F(Xt, β) = Xtβ in the model yt = F(Xt, β) + εt, then the predicted
value of yt for a given value of Xt is given by
ttˆˆ = β ,y X
and the variance of ˆt
y (sample regression line), t
2
yσ is given by
t
2t ty
ˆ = Var (β) X Xσ ′ ,
with the variance of the forecast error (actual y) given by:
2
FEσ
t
2 2y
= + .σ σ
Note that 2
FEσ takes account of the uncertainty associated with the unknown
regression line and the error term and can be used to construct confidence
intervals for the actual value of Y rather than just the regression line.
Unbiased sample estimators of t
2
yσ and 2FEσ can be easily obtained by replacing σ2
with its unbiased estimator s2.
Confidence intervals for t tE ( | ) ,Y X the population regression line:
ttˆˆt t t(α/2) (α/2) yy
ˆ ˆPr [ β - < < β + ] = 1 - αt s t sX Y X
Confidence intervals for Yt:
t t t(α/2) FE (α/2) FEˆ ˆP R [ β - < < β + ] = 1 - αt s t sX Y X
Y t
X t
29II'
4. A comparison of confidence intervals.
Some students have found the following table facilitates their understanding of the different confidence intervals for the
population regression line and actual value of Y. The column for the estimated coefficients is only included to compare
the organizational parallels between the different confidence intervals.
Statistic ( )1ˆ ' 'X X X Yβ
−= ˆˆ
t tY X β= = sample regression line =
predicted Y values corresponding tot
X .
FE (forecast error)
ˆˆt t t t
FE Y Y Y X β= − = −
Distribution
( )12, 'N X Xβ σ
−
( )12 2 '
ˆ, ( ' )t
t t tYN X X X X Xβ σ σ
− =
2 2 2ˆ0,t
FE YN σ σ σ = +
t-stat / 2 / 2
ˆ
ˆ1 Pr
i
i it ts
α α
β
β βα
− − = − < <
= ˆ ˆ
2 2
ˆ ˆPri i
i i it s t sα αβ ββ β β
− < < +
/ 2 / 2
ˆ
ˆ1 Pr
t
t t
Y
X Xt t
sα α
β βα
− − = − < <
ˆ ˆ
2 2
ˆ ˆPr t t tY YX t s X X t sα αβ β β
− < < +
/ 2 / 2
01 Pr
FE
FEt t
sα αα
−− = − < <
=
2 2
Pr 0FE FEFE t s FE t sα α
− < < +
=
2 2
ˆ ˆPr t FE t t FEX t s Y X t sα αβ β
− < < +
C.I. i
β : ˆ ˆ
2 2
ˆ ˆ, i i
i it s t sα αβ ββ β
− +
t
X β : ˆ ˆ
2 2
ˆ ˆ, Xt tY YX t s t sα αβ β
− +
:t
Y2 2
ˆ ˆ, Xt FE t FEX t s t sα αβ β
− +
where Y
s is used to compute confidence intervals for the regression line ( ( )t tE Y X β= ) and FE
s is used in the calculation of
confidence intervals for the actual value of Y. Recall that 2 2 2ˆ sFE Y
s s= + ; hence, 2 2ˆ > FE Y
s s and the confidence intervals for
Y are larger than for the population regression line.
30II'
5. Uncertainty about X. In many situations the value of the independent variable also
needs to be predicted along with the value of y. Not surprisingly, a “poor” estimate of
Xt will likely result in a poor forecast for y. This can be represented graphically as
follows:
6. Hold out samples and a predictive test.
One way to explore the predictive ability of a model is to estimate the model on a
subset of the data and then use the estimated model to predict known outcomes which
are not used in the initial estimation.
7. Example M6 + G2.5 + 10 = y ttt
ttt 2 3ˆ ˆ ˆ= + + G Mβ β β
where yt, Gt, Mt denote GDP, government expenditure, and money supply.
Assume that
Yt
X
Yt
X
X t
31II'
. 10 = s ,10
1532
3205
2510
= )XX( s23-1-2
′
a. Calculate an estimate of GPD(y) which corresponds to
Gt = 100, Mt = 200, i.e., Xt = (1, 100, 200).
tt
10
ˆˆ = β = (1, 100, 200) 2.5y X
6
1460. = 1200 + 250 + 10 =
b. Evaluate s2
ytand s
2FE
corresponding to the Xt in question (a).
10.
200
100
1
1532
3205
2510
200) 100, (1, = X ))XX( s( X = s 3-t
1-2ty
2
t
′ ′
921.81 =
30.30 = syt
931.81 = 921.81 + 10 = s + s = s y2
FE
2
t
30.53 = SFE
7. Forecasting—basic Stata commands
a) The data file should include values for the explanatory variables
corresponding to the desired forecast period, say in observations n1 + 1 to n2.
b) Estimate the model using least squares
reg Y X1 . . . XK, [options]
c) Use the predict command, picking the name you want for the predictions, in
32II'
this case, yhat, e, ˆ, and FE Y
s s .
predict yhat, xb ← this option predicts Y
predict e, resid ← this option predicts the residuals (e)
predict sfe, stdf ← this option predicts the standard
error of the forecast ( FEs )
predict syhat, stdp ← this option predicts the standard
error of the prediction (Y
s )
list y yhat sfe ← this option lists indicated variables
These commands result in the calculation and reporting of s e, ,Y Y, FE and
Ys for observations 1 through n2. The predictions will show up in the Data
Editor of STATA under the variable names you picked (in this case, yhat, e, sfe and syhat). You may want to restrict the calculations to t= n1 + 1, .. , n2 by using
predict yhat if(_n> n1), xb
where “n1” is the numerical value of n1.
d) The variance of the predicted value can be calculated as follows:
s - s = s 2FE
2y
2
t
33II'
H. PROBLEM SETS: MULTIVARIATE REGRESSION
Problem Set 3.1
Theory
OBJECTIVE: The objective of problems 1 & 2 is to demonstrate that the matrix equations and summation equations for the estimators and variances of the estimators are equivalent.
Remember 1
n
t
t
X NX=
=∑ and Don’t get discouraged!!
1. BACKGROUND: Consider the model (1) Yt = β1 + β2 Xt+ εt (t = 1, . . ., N) or equivalently,
(1)’
1 1 1
2 2 21
2
n n n
1 εY X
1 εY X = +
1 εY X
β
β
M M M M
(1)” Y = Xβ + ε
The least squares estimator of YX)XX( = ˆ is ˆ
ˆ1-
2
1′′β
β
β.
If (A.1) - (A.5) (see class notes) are satisfied, then
βββ
ββββ
)ˆVar()ˆ ,ˆCov(
)ˆ ,ˆCov()ˆVar( = )ˆVar(
212
211
)XX( =-12 ′σ
QUESTIONS: Verify the following: *Hint: It might be helpful to work backwards on part c and e.
a.
Σ′
XXN
XNN = XX
t
2 and
1
' N
t t
t
NY
X YX Y
=
= ∑
b. )XN - X( / )Y XN - YX( = ˆ 2
t
2tt2
ΣΣβ
34II'
c. Xˆ - Y = ˆ21 ββ
d. )XN - X( / = )ˆVar(2
t
22
2Σσβ
e.
Σσβ
XN - X
X + n
1 = )ˆVar(
2
t
2
2
2
1
)ˆVar( X + )YVar( =2
2β
f. )ˆVar( X- = )ˆ ,ˆCov(221 βββ
(JM II’-A, JM Stats)
2. Consider the model: εβ ttt + X = Y
a. Show that this model is equivalent to Y = Xβ + ε
where
1 1 1
2 2 2
n n n
εY X
εY XY ,X = ,ε
εY X
= =
M M M
b. Using the matrices in 2(a), evaluate YX)XX(-1 ′′ and compare your answer with
the results obtained in question 4 in Problem Set 1.1.
c. Using the matrices in 2(a) evaluate )XX(-12 ′σ .
(JM II’-A)
Applied
3. Use the data in HPRICE1.RAW to estimate the model
price = β0 + β1sqrft + β2bdrms + u
where price is the house price measured in thousands of dollars, sqrft is the floorspace measured in square feet, and bdrms is the number of bedrooms.
a. Write out the results in equation form. b. What is the estimated increase in price for a house with one more bedroom, holding
square footage constant?
35II'
c. What is the estimated increase in price for a house with an additional bedroom that is 140 square feet in size? Compare this to your answer in part (ii).
d. What percentage variation in price is explained by square footage and number of bedrooms?
e. The first house in the sample has sqrft = 2,438 and bdrms = 4. Find the predicted selling price for this house from the OLS regression line.
f. The actual selling price of the first house in the sample was $300,000 (so price = 300). Find the residual for this house. Does it suggest that the buyer underpaid or overpaid for the house?
36II'
Problem Set 3.2
Theory
1. R2, Adjusted R2( 2R ), F Statistic, and LR
The R2 (coefficient of determination) is defined by
SST
SSE - 1 =
SST
SSR = R
2
where e = SSE t
2Σ and )Y - Y( = SSR , )Y - Y( = SST2
t
2t ΣΣ .
Given that SST = SSR + SSE when using OLS,
a. Demonstrate that 0 ≤ R2 ≤ 1.
b. Demonstrate that n = k implies R2 = 1. (Hint: n=k implies that X is square. Be
careful! Show .) ˆX = Y = Y β
c. If an additional independent variable is included in the regression equation, will
the R2 increase, decrease, or remain unaltered? (Hint: What is the effect upon
SST, SSE?)
d. The adjusted , R , R22 is defined by .
1)SST/(n-
k)SSE/(n- - 1 = R
2 Demonstrate that
, 1 R R kn-
k-1 22 ≤≤≤ i.e., the adjusted R2 can be negative.
))R-(1 kn-
1n- =
kn-
1n-
SST
SSE = R - 1 :(Hint 22
e. Verify that
σ2
SSE - SSE* = LR if σ2 is known
/SSE)ln(SSE*n = if σ2 is unknown where SSE* denotes the
restricted SSE.
37II'
f. For the hypothesis H0: β2 = . . . = βk = 0, verify that the corresponding LR statistic
can be written as )R-ln(1n - = R-1
1ln n = LR 2
2
.
FYI: The corresponding LM test statistic for this hypothesis can be written in
terms of the coefficient of variation as 2LM NR= .
(JM II-B)
2. Demonstrate that
a. X’e = 0 is equivalent to the normal equations . YX = ˆXX ′β′
b. X’e = 0 implies that the sum of estimated error terms will equal zero if regression
equation includes an intercept.
Remember: ˆˆe Y Y Y X β= − = −
(JM II-B)
Applied
3. The following model can be used to study whether campaign expenditures affect election
outcomes:
voteA = β0 + β1ln(expendA) + β2 ln(expendB) + β3 prtystrA + u
where voteA is the percent of the vote received by Candidate A, expendA and expendB are
campaign expenditures by Candidates A and B, and prtystrA is a measure of party
strength for Candidate A (the percent of the most recent presidential vote that went to A's
party).
i) What is the interpretation of β1?
ii) In terms of the parameters, state the null hypothesis that a 1% increase in A's
expenditures is offset by a 1% increase in B's expenditures.
iii) Estimate the model above using the data in VOTE1.RAW and report the results in
the usual form. Do A's expenditures affect the outcome? What about B's
expenditures? Can you use these results to test the hypothesis in part (ii)?
iv) Estimate a model that directly gives the t statistic for testing the hypothesis in part
(ii). What do you conclude? (Use a two sided alternative.). A possible approach,
test 0 1 2:H Dβ β+ = , plug 2D β− for 1β and simplify.
(Wooldridge C. 4.1)
38II'
4. Consider the data
t Output (Yt) Labor (Lt) Capital (Kt)
1 40.26 64.63 133.14
2 40.84 66.30 139.24
3 42.83 65.27 141.64
4 43.89 67.32 148.77
5 46.10 67.20 151.02
6 44.45 65.18 143.38
7 43.87 65.57 148.19
8 49.99 71.42 167.12
9 52.64 77.52 171.33
10 57.93 79.46 176.41
The Cobb Douglas Production function is defined by
(1) 3 41 2β β+ tβ β
t t t t = εeY K L
where (β2t) takes account of changes in output for any reason other than a change in Lt or
Kt; εt denotes a random disturbance having the property that lnεt is distributed N(0, σ2).
Labor’s share
receipts sales total
receipts wagetotalis given by β3 if β3 + β4 (the returns to scale) is
equal to one. β2 is frequently referred to as the rate of technological change
. K and L fixedfor Y/dt
dYt
t
Taking the natural logarithm of equation(1),we obtain
(2) t t t1 2 3 t 4ln = + t + ) + ln( ) + ln(ε ) .β β β ln(L βY K
If ββ 43 + is equal to 1, then equation (2) can be rewritten as
(3) t t t t1 2 3ln( / ) = + t + ln( / ) + ln .Y K L K t
εβ β β
a. Estimate equation (2) using the technique of least squares.
b. Corresponding to equation (2)
1) Test the hypothesis Ho: β2 = β3 = β4 = 0. Explain the implications of this
hypothesis. (95% confidence level)
2) perform and interpret individual tests of significance of β2, β3, and β4, i.e. test
39II'
Ho : βi = 0 .α = .05.
3) test the hypothesis of constant returns to scale, i.e., Ho: β3 + β4 = 1, using
a. a t-test for general linear hypothesis, let restrictions δ= (0,0,1,1);
b. a Chow test;
c. a LR test.
c. Estimate equation (3) and test the hypothesis that labor’s share is equal to .75, i.e., β3 =
.75.
d. Re-estimate the model (equation 2) with the first nine observations and check to see if the actual
log(output) for the 10th observation lies in the 95% forecast confidence interval.
(JM II)
5. The translog production function corresponding to the previous problem is given by
2 2
1 2 3 4 5 6 7ln(Y) = + t + ln(L) + ln(K) + (ln(L) + (ln(K) + (ln(L)) ln(K) + ln(ε )β β β β β ) β ) β
t
a. What restrictions on the translog production function result in a Cobb-Douglas
production function?
b. Estimate the translog production function using the data in problem 5 and use the Chow and
LR tests to determine whether it provides a statistically significant improved fit to the data,
relative to the Cobb-Douglas function.
(JM II)
6. The transcendental production function corresponding to the data in problem 5 is defined by
1 2 3 4 5 6 + t + L + Kβ β β β β βY = e L K
a. What restrictions on the transcendental production function result in a Cobb-Douglas
production function?
b. Estimate the transcendental production function using the data in problem 2 and use the Chow
and LR tests to compare it with the Cobb-Douglas production function.
(JM II)
40II'
APPENDIX A
Some important derivatives:
Let
aa
aa =A ,
a
a = a ,
x
x = X
2221
1211
2
1
2
1
(symmetric) )a = a = a( 2112
1. a = dX
a)X( d =
dX
X)a( d ′′
2. AX2 = dX
AX)X( d ′
Proof of a = dX
a)X( d ′
Note: a’X = X’a = a1x1 + a2x2
a = a
a =
X/a)X(
X/a)X( =
dX
a)X( d
2
1
2
1
∂′∂
∂′∂′
Proof of d (X AX)
= 2AXdX
′
Note: X’AX = a11x1
2 + (a12 + a21) x1x2 + a22 x22
∂′∂
∂′∂′
xa2 + xa2
xa2 + xa2 =
X/AX)X(
X/a)X( =
dX
AX)X( d
2221
2111
2
1
xa + xa
xa + xa 2 =
2221
2111
x
x
aa
aa 2 =
2
1
22
11
.AX2 =
41II'
APPENDIX B
An unbiased estimator of σ2 is given by
. k)SSE/(n- = y) )X)XX( X - (Iy( kn-
1 = s
1-2 ′′′
Proof: To show this, we need some results on traces:
a = (A)tr ii
n
iΣ
1) tr(I) = n
2) If A is idempotent, tr(A) = rank of A
3) tr(A+B) = tr(A) + tr(B)
4) tr(AB) = tr(BA) if both AB and BA are defined
5) tr(ABC) = tr(CAB)
6) tr(kA) = k tr(A)
Now, remember that
2 1 = e eσ
n′
and ee k -n
1 = s
2 ′
-1ˆe = y - Xβ = y - X ( X X X y = My)′ ′
= M (Xβ + ε) = MXβ + Mε ,
= M ε ,
where M = I - X(X’X)-1X’.
Note that M is symmetric, and idempotent (problem set R.2).
So 2 1 1 = e e = εM Mεσ
n n′ ′ ′
42II'
1= εMMε .
n′
1= εMε .
n′
and 2 1 = εMε .s
n - k′
2 1 1E ( ) = E (εMε) = E (tr(εMε))σ
n n′ ′ because i jcov ( , ) = 0, i j)ε ε ≠
1 1= Etr (M εε ) = tr (ME (εε ))
n n′ ′
2 21 1= tr (M I) = tr ( M)σ σ
n n
2σ
= tr(M)n
2
-1σ= tr(I - X(X X X ))
n′ ′
2-1σ
= (n - tr (X(X X X )))n
′ ′
2
-1σ= (n - tr (X X(X X )))
n′ ′
2
kσ
= (n - trace ( ))In
2σ
= (n - k)n
2 2 2 2n - k n= so E ( ) = E ( ) = .ˆσ s σ σ
n n - k
Therefore 2σ is biased, but 2 2 2n
E ( ) = E ( ) = ˆs σ σn - k
and s2 is unbiased.
43II'
APPENDIX C
β = AY = (X X) X Y′ ′ ′% is BLUE.
Proof: Let ii = Yβ A% where Ai denotes the ith row of the matrix A. Since the result will be
symmetric for each βi (hence, for each Ai), denote Ai by a’ where a is a (n by 1) vector.
The problem then becomes:
Min a’Ia when I is nxn
s.t. AX = I when X is nxk (for unbiasedness)
or min a’Ia
s.t. X’a = i where i is the ith column of the identity matrix.
Let = a Ia + λ (X a - i)′ ′ ′l which is the associated Lagrangian function where λ is kx1.
The necessary conditions for a solution are:
= 2a I + λ X = 0a
∂′ ′ ′
′∂
l
= (X a - i) = 0 .λ
∂′
′∂
l
This implies
a = (-1/ 2)λ X ) .′ ′ ′
Now substitute a = (-½)Xλ into the expression for = 0λ
∂
′∂
land we obtain
(-1/ 2) X X λ = i′ -1
λ = - 2 (X X i)′
X)XX( i(-2) 2)/(-1 = a-1 ′′′′
. A = X)XX(i = i-1 ′′′
which implies
X)XX( =A -1 ′′
hence, -1
β = (X X X y .)′ ′%
III A 1
James B. McDonald
Brigham Young University 2/9/2010 IV. Miscellaneous Topics
A. Multicollinearity
1. Introduction
The least squares estimator of β in the model
y = Xβ + ε
is defined by
β = (X'X)-1X'y.
As long as the columns of the X matrix are independent, (X'X)-1 exists and β can
be evaluated. If any one column of X can be expressed as a linear combination of the
remaining columns, X'X = 0 and (X'X)-1 is not defined.
Consider the matrix
k
1 1 1 2 1 k
2 1 2 2 2 k
k 1 k 2 kX
Cor( , ) Cor( , ) ... Cor( )X X X X X X
Cor( , ) Cor( , ) ... Cor( )X X X X X XCor(X) =
Cor( , ) Cor( , ) Cor( )X X X X X
M M M
L
12 1k
21 2k
k1 k 2
1 ...
1 ...=
1
ρ ρ
ρ ρ
ρ ρ
M M O M
L
where ρij
= correlation (Xi,Xj). Recall that 0 ≤ Cor(X) ≤ 1.
One "polar" case is that in which the "independent" or exogenous variables are
orthogonal or uncorrelated with each other, i.e., Cor(X) = I; hence, Cor(X) = 1.
III A 2
Another polar case is the situation in which one exogenous variable can be written as a
linear combination of the remaining exogenous variables, e.g.,
xt2 xt3
Sales Revenuet = β1 + β2 (Sales of right ski boots) + β3 (Sales of left ski boots) + εt.
In this case,
2 3
3 2
1 Cor( , ) 1 1X XCor(X) =
Cor( , ) 1 1 1X X
=
and Cor(X) = 0.
While the extreme case of Cor(X) = 0 is not particularly common, frequent instances in
which Cor(X) is small may arise in which some rather "strange" results may occur. We
will define multicollinearity to exist whenever Cor(X) < 1. Cor(X) = 0 is referred to
as exact multicollinearity. Multicollinearity is not necessarily bad, but it may make it
difficult to accurately estimate the impact of individual variables on the expected value of
the dependent variable. The question of interest is generally not whether we have
multicollinearity, but what is the "degree" of multicollinearity, what are the associated
consequences, and what can be done about it? While multicollinearity can contribute to
imprecise estimates, it is not the only cause or explanation of imprecise estimation. In
summary, the impact of multicollinearity is that if two or more independent variables move
together, then it can be difficult to obtain precise estimates of the effects of the individual
variables, βi = ∂Ε(yt)/∂Xti.
III A 3
2. A special case of two explanatory variables.
In order to illustrate some of the consequences of multicollinearity, consider the
following model:
(1) yt = β1 + β2xt2 + β3xt3 + εt t = 1,2, . . ., n.
Summing (1) over t and dividing by n we obtain
(2) ty = β1 + β2 x 2 + β3 x 3 + ε
where y , x 2, x 3, and ε , respectively, denote the sample means of yt, xt2, xt3, and εt.
Subtracting (2) from (1) yields
(3) yt = β2xt2 + β3xt3 + tε%
where yt = yt - y , xt2 = xt2 - x 2, xt3 = xt3 - x 3, and tε% = εt - ε .
The least squares estimators of β2 and β3 are given by (Appendix A.1)
(4) 2 -1
3
ˆ = (X X X y)
ˆ
β′ ′
β
% % %
where 2 y22 23
3y32 33
mm mX X = , X y =
mm m
′ ′
% % %
)x - y)(x - x( = x~x~ = m jtiti
n
1=t
tjti
n
1=t
ij ∑∑
n n
iy ti ti it t
t=1 t=1
= = ( - )( - y)y ym x x x∑ ∑%%
and
(5) 2 -12
3
ˆVar = (X X .)
ˆ
β′ σ β
% %
From equation (5) it can be shown that
(6) i
22ˆ 2
i 23
= Var( )(1- )Xn
β
σσ
ρ
(7) s
- ˆ = t
ˆ
iiˆ
i
i
β
β
ββ
III A 4
where
22t 2 t32 2 3t 2 t3 2
2 323 22 2 2t2 t3 t 2 2 t3 3
( - )( - )( ) x xx xx x = = Correlation (X ,X ).
x x ( - ( - ))x x x x
∑∑=ρ
∑ ∑ ∑ ∑
% %
% %
The confidence intervals for βi are given by
(8) .) - )(1xVar(n
st ˆ = st ˆ
2
23ti
22/1
2/iˆ2/i i
ρ±β±β αβα
Equation (6) can be used to illustrate the point made on page 3 about multicollinearity
only being one of several factors which may impact estimator precision. From (6) we note
that (other things being equal) increasing the sample size (n), increasing the variance of the
variable whose coefficient is being estimated (Xi), reducing σ2, or reducing the square of the
correlation between the independent variables will increase the precision of our estimators,
i.e., reduce the variance of the estimator. A graphical analysis may be helpful.
In order to focus on the effect of multicollinearity on the variance of say β 2, consider
the ratioσ β2
2
~ with multicollinearity (ρ23
≠ 0) to σβ2ˆ
2without multicollinearity (ρ23 = 0). In
other words, for different values of ρ2223, we calculate this ratio, which reflects how many
times worse (greater) the variance is of an estimator subject to multicollinearity compared to
one without. This ratio is equal to 1/(1-ρ2223).
ρ 2
23 2
2
2
2ˆ
β
β
σ
σ%
0
1
1/2
2
2/3
3
9/10
10
99/100
100
Note again that other things being equal, the larger the correlation between the two
independent variables in equation (1), the larger the variance of β 2 and the less "precise" will be
III A 5
the estimator. The effect can be substantial. However, it is important to recall that multicollinearity is not the only factor having an impact on estimator precision as measured
by σβ2ˆ
2, see equation (6).
The following figure of the density of β 2 for different values of ρ23
(and hence σβ2ˆ
2) will be
useful in our discussion of the possible impact of multicollinearity.
Density of ββββ 2
Recall that (i) the points of inflection on the normal density curve occur at µ ± σ so that
if we are testing the hypothesis Ho: β2 = 1
(ii) 2 2
ˆ ˆ2
ˆPr(- < - 1 ) 0.68β β≤ =σ σβ
(iii) 2 2
ˆ ˆ2
ˆPr(-2 < - 1 < 2 ) 0 .95β β =σ σβ
(iv)
σσ
ββ
ββ ˆˆ
2
2
22
1- <
1 - ˆPr = 0) < ˆPr(
σρ
σ
β
β
/m - 1- < 1-ˆ
Pr = 222
23ˆ
2
2
From (iv) we can evaluate the probability of β 2 assuming the "wrong sign" for the case in which
β2 = 1 for given m22 and σ. In the previous figure these probabilities are shown as the area to
the left of the vertical dotted line. If σ = m22 (strictly for purposes of exposition), the
probability of an "incorrect" sign would be given in the following table.
ˆ 0.5=2
2
βσ
ˆ 1.0=2
2
βσ
ˆ 1.5=2
2
βσ
III A 6
23ρ
Probability of an incorrect sign
0
.16
1/2
.24
2/3
.28
9/10
.37
99/100
.46
Based on our previous discussion we note that increases in and "severe" multicollinearity
can be associated with the following situations.
(1) The precision of estimation is reduced (Var( β i) increases) so that it becomes difficult to
accurately estimate individual effects of variables which move together.
(2) It was noted that the probability of obtaining estimates having the "wrong" sign increases as Corr2(x2,x3) increases.
(3) Note from (7) that as ρ23 → 1, the t-statistics get smaller: hence, based upon a strict adherence to a "t-criterion" for deleting variables, a variable may be deleted from an equation when that variable does have an effect. This is always a possibility in statistical inference, but with severe multicollinearity the confidence intervals can become so wide (see equation (8)) as to make it difficult to reject "almost any hypothesis." Recall that confidence intervals for βi are given by
) - )(1xVar(n
s t ˆ
2
23ti
2
ci ρ±β
for the case in which k = 3.
(4) Severe multicollinearity is frequently associated with "significant" F statistics and
"insignificant" t statistics for a group of variables which are expected to be important. The collective importance of a group of variables can be checked using a Chow test.
Huge F-statistics but small t-statistics? Likely diagnosis: multicollinearity
III A 7
To visualize this situation consider the joint confidence intervals for β2 and β3 which might appear as
Note that the individual confidence intervals for β2 and β3 include 0; hence, we would not be able to reject the hypothesis that β2 or β3 = 0. The joint confidence interval for β2 and β3 does not include the origin; hence, the F statistic will be statistically significant. It is the high correlation between x2 and x3 that contributes to the elliptical shape of the joint confidence interval.
(5) Coefficient estimates may be extremely sensitive to the addition of more data.
(6) Corr(X) = 23 2
23
23
11
1
ρρ
ρ= − may be close to zero.
(7) Various pairwise correlations between the X's may be close to 1.
(8) Condition index (CI).
High pairwise correlations between explanatory variables are sufficient for multicollinearity problems, but are not necessary. Belseley, Kuh and Welsch (BKW) define a condition index
Maximum eigen valueCI =
Minimum eigen value
where the eigen values correspond to the correlation matrix of the x's. BKW use arule of thumb is that multicollinearity is high if CI > 30.
Consider the condition index for the two polar cases in the introduction of this section.
III A 8
10
01 = C1
11
11 = C2
which have respective eigen values
(λ11, λ12) = (1,1) and (λ21, λ22) = (0, 2).
The corresponding condition indices are then
0 = 1
1 = CI1
2
2 = (undefined) so the CI as C 0.CI
0→ ∞ →
We remind the reader that the CI merely provides a rule of thumb.
In problem number 3.1(1), the reader is asked to verify that the condition index
corresponding to the correlation matrix
ρ
ρ
1
1 = C
is given by1 + | |
.1 - | |
ρ
ρ
Note that CI increases as ρ increases and includes C1 and C2 as special cases.
3. Some results for the case of an arbitrary number of independent variables.
Consider the more general model
(9) Yt = β1 + β2Xt2 + β3Xt3 + . . . + βkXtk + εt.
Some of the results obtained in the previous section can be extended to the more general case as follows:
(10a-c) i
22ˆ 22
i i
= (1 - )sn
β
σσ
ρ
i
22ˆ 22
i i
s = S
(1 - )snβ
ρ
i
i
1/ 22
i i ii i iˆ
ˆ
ˆ ˆ ˆ - s (1 - ( - )) = = t
ss
nβ
β
ρ ββ β β
where 2 2
i ti is = (X -X ) /n∑
III A 9
ρ 2
i = Correlation2 (between Xi and all other independent variables)
= R2 obtained from regressing Xi on other independent variables.
These results seem reasonable. In particular, the higher the correlation between an
independent variable and the set of other independent variables, the less precise the
associated coefficient estimator as measured by the variance. Again, we note that
“multicollinearity" is only one factor contributing to poor estimator precision (large
σβ2ˆ
2). Large values of σ2 and small N and small s 2
i have the same impact.
The impact of multicollinearity as measured by pairwise correlations between
independent variables becomes much less clear. In particular, if cij is the correlation
between the ith and jth independent variable, it can be shown that
)c)(c(Ns
- = c
ikii
2i
2
ik
2ˆ
i σ
∂
σ∂ β (11)
where cst denotes the stth element in the inverse of the correlation matrix. Consequently,
the impact of an increase in the pairwise correlation between two variables upon
estimator precision is indeterminant.
Finally, for a given "degree of multicollinearity," individual coefficient estimators
may be statistically significant if the overall fit of the model 2( )R
is good enough. To be more specific
(12)
i
ii/ 2
ˆ
ˆ - > t
sα
β
ββ
if and only if
2
22 i2 iii2 2
2 y
ˆ( - )N > 1 - (1 - )sR
t sα
ββρ
In other words, for any degree of multicollinearity, as measured by 2
iρ , the estimate of βi
will be statistically significant if the adjusted R2 ( 2R ) is large enough to satisfy the inequality in equation (12). This inequality can be easily derived by squaring both sides
of the first inequality, replacing the 2ˆsiβby
2
2ti
s
n Var( )(1 - )x iρ
, noting that
III A 10
22
2
/( )1 1
/( 1) y
SSE n k sR
SST n s
−= − = − −
and manipulating the resulting expression. The second
inequality in (12) can also be rewritten in terms of R2 .
III A 11
4. Some proposed "solutions" to the multicollinearity problem
There have been numerous solutions proposed to circumvent the multicollinearity
problem. However, the basic problem with multicollinearity is that the variables
(exogenous) may be moving so closely together as to make it difficult to obtain accurate
estimates of individual effects and, consequently, each proposed technique has associated
problems. It should be mentioned that even for the case of severe (not perfect)
multicollinearity, least squares estimators are unbiased, minimum variance of all unbiased
estimators, consistent, and are asymptotically efficient as long as (A.1)-(A.5) are satisfied.
Some suggested solutions include:
(1) Obtain more data: If additional data had been available it would probably have been
used initially. One might try combining cross sectional and time series data. Panel
data often includes more variability and less collinearity among the variables.
(2) Principle components: Replace "problem variables" with a fewer number of linear
combinations of the deleted variables which "accounts for most of their explanatory
power (variance)." This approach is associated with interpretational problems as well as
resulting in the possibility of biased estimators.
(3) Delete a variable: The deletion of one of the variables which is "nearly" linearly related
to the other independent variables is a common practice, but may result in biased
estimators if it is an important variable.
(4) Impose constraints on the parameters: This approach is really a generalization of
(3) deleting a variable, i.e., βi = 0. However, there may be theoretical reasons for
imposing constraints on the parameters such as constant returns to scale in a production
function or no money illusion in demand equations. The validity of these constraints
could be investigated using a Chow or likelihood ratio test. Judge has shown that least
squares estimator which takes account of linear constraints is minimum variance among
estimators satisfying the constraint. If the constraint is not true, the estimator will be
biased and have variances equal to unconstrained least squares.
III A 12
(5) Ridge Regression Techniques
A simple ridge regression estimator is given by the following
β (k) = (X'X + kI)-1X'y.
The ridge regression estimator will be biased (bias( β (k)) = -k(X'X + kI)-1β), but the
value of k is often selected to minimize the MSE ( β (k)), say for k*. Note that for k = 0
the ridge estimator is the OLS estimator of β, i.e., β (0) = β . It can be shown that
MSE ( β (k*)) ≤ MSE ( β (0)).
The basis for selected β (k*) is motivated by considering the following figure.
In this case the OLS estimator is unbiased, but has a large variance relative to the biased
ridge estimator. Recall that it can be shown that MSE( β ) = var( β ) + (bias( β ))2.
This figure suggests possible benefits by selecting a slightly biased estimator if there are significant reductions in variance. The MSE is often used to quantify this tradeoff. Ridge estimators are biased and the problem of statistical inference has not been worked out.
ββββ
( )β k *
( )β 0
III A 13
5. PROBLEM SET 4.1
Multicollinearity
Theory 1. Prove that the condition index (C.I.) corresponding to the correlation matrix
1+1
C is C.I. = 1 1-
ρρ =
ρ ρ
Hint: Use the quadratic formula from college algebra.
(JM III-A)
2. Prove and discuss equation (12) in the notes on collinearity. (Hint: this problem basically
involves algebraic manipulation, be patient). Based on the result in equation (12), you can see that statistical significance of individual estimators is retained for an arbitrary degree of multicollinearity if the explanatory power of the model is high enough.
(JM III-A 6)
Applied 3. Consider the following data:
Yt Ct Wt
1883 1749 2.36 1909 1756 2.39 1969 1814 2.47 2015 1867 2.52 2126 1943 2.65 2239 2047 2.81 2335 2127 2.93 2403 2164 3.01 2486 2256 3.12 2534 2315 3.18 2534 2328 3.70
Where Y
t, C
t, and W
t, respectively, denote income, consumption, and wage rates.
a. Estimate
(1) t 1 2 t tC Y= α + α + ε
III A 14
(2) t 1 2 t tC W ′= β + β + ε
(3) t 1 2 t 3 t tC Y W ′′= γ + γ + γ + ε
using the first ten observations. Also, estimate equation (3) for the entire data set (11 observations). Explain the results.
(JM III-A)
4. Refer to problem 4 from "HW 2.2: K-Variate Regression". Test the hypothesis that
β3 = β4 = 0 in equation (2) and reconcile the results with the results obtained based upon individual tests of significance for β3 and β4 using t-statistics.
(JM III-A)
5. Consider the following set of data: Y X
2 X
3
2 1 1 4 2 4 6 3 7 8 4 10 10 5 13 12 6 16 14 7 19 16 8 22 18 9 25 20 10 28
Discuss any problems associated with estimating β1, β2 and β3 in the model
Yt = β
1 + β
2X
t2 + 3β X
t3 + ε
t.
(JM III-A)
6. In a study relating college grade point average (GPA) to time spent in various activities,
you distribute a survey to several students. The students are asked how many hours they
spend each week in four activities: studying, sleeping, working, and leisure. Any
activity is put into one of four categories, so that for each student, the sum of hours in the
four activities must be 168.
a. What problems will you encounter in estimating the model
1 2 3 4 4 tGPA study sleep work leisure= α + α + α + α + α + ε
III A 15
b. How could you reformulate the model so that it’s parameters have a useful
interpretation? (Wooldridge, 3rd edition, problem 3.5)
7. A problem of interest to health officials (and others) is to determine the effects of
smoking during pregnancy on infant health. One measure of infant health is birth
weight: a birth weight that is too low can put an infant at risk for contracting various
illnesses. Since factors other than cigarette smoking that affect birth weight are likely to
be correlated with smoking, we should take those factors into account. For example,
higher income generally results in access to better prenatal care, as well as better
nutrition for the mother. An equation that recognizes this is
bwght = β0 + β1cigs + β2faminc + u
a) What do you think is the most likely sign for β2?
b) Do you think cigs and faminc are likely to be correlated? Explain why the
correlation might be positive or negative.
c) Now estimate the equation with and without faminc, using the data in BWGHT.RAW.
Report the results in equation form, including the sample size and R-squared.
Discuss your results, focusing on whether adding faminc substantially changes the
estimated effect of cigs on bwght. Is the estimated coefficient of β2 statistically
significant?
III A 16
Appendix 1. Derivation of equation (4)
yt = β1 + β2xt2 + β3xt3 + εt
y = β1 + β2 x 2 + β3 x 3 + ε
( )ty - y = β2 (xt2 - x 2) + β3(xt3 - x 3) + εt - ε
yt = β2x2 + β3x3 + ε% t
The X% matrix is given by
x~x~
..
..
..
x~x~
x~x~
x~x~
x~x~
3n2n
4342
3332
2322
1312
and
12 13
22 23
32 33
12 22 n 2 42 43
13 23 n3
n 2 n3
x x
x x
x x
...x x x x x(X X) =
... . .x x x
. .
. .
x x
′
% %
% %
% %
% % % % %% %
% % %
% %
2
t 2 t 3t 2
2t3 t 2 t3
x xx =
x x x
∑∑ ∑ ∑
% %%
% % %
mm
mm =
3332
2322
III A 17
Appendix 2. Derivation of equation (6)
m - mm
mm-
m-m
= mm
mm2233322
2223
2333
3332
2322
1-
m - mm
m = )ˆVar(
2233322
332
2
σβ
m
m - mm =33
2233322
2σ
m
m -m
=
33
223
22
2σ
mm
mm -m
=
3322
22322
22
2σ
)(m - m =
2
232222
2
ρ
σ
) - (1m =
2
2322
2
ρ
σ
) - )(1x~( =
2
2322t
2
ρ∑
σ
Similarly,
m - mm
m = )ˆVar(
2233322
222
3
σβ
) - (1m =
2
2333
2
ρ
σ
) - )(1x~( =
2
2323t
2
ρ∑
σ
I I I B 1
James B. McDonal d Br i gham Young Uni ver s i t y 2/ 18/ 2010
IV. Miscellaneous Topics
B. Binary Variables (Dummy Variables)
Many var i abl es , whi ch we may want t o i ncl ude i n an economet r i c model , may
not be quant i t at i ve ( measur abl e) , but r at her ar e qual i t at i ve i n nat ur e. For
exampl e, an i ndi vi dual wi l l be a homeowner , or wi l l not ; wi l l be mar r i ed or
not . Such char act er i s t i cs may have a bear i ng on an i ndi vi dual ' s behavi or , but
ar e not quant i f i abl e. One way t o i ncl ude t he ef f ect of such char act er i s t i cs
i s t o i nt r oduce bi nar y or dummy var i abl es . For exampl e, l et t he bi nar y
var i abl e Dt i ndi cat e whet her a gi ven i ndi vi dual i s mar r i ed or not by def i ni ng
Dt = 0 i f t he t th i ndi vi dual i s s i ngl e and Dt = 1 i f t he t th i ndi vi dual i s
mar r i ed.
We now cons i der sever al model s whi ch make use of dummy var i abl es , di scuss
t he dummy var i abl e t r ap, i ndi cat e some i nt er es t i ng gener al i zat i ons , and
i nves t i gat e appl i cat i ons of t hese t echni ques t o sever al pr obl ems i n
economi cs .
1. Models with binary explanatory variables
a. An exampl e: t he r el at i onshi p bet ween sal ar y and a col l ege degr ee
Let Yt
= Annual sal ar y of t he t th per son i n t he sampl e,
D1t
= 1 i f t he t th per son i s a col l ege gr aduat e
= 0 ot her wi se,
D2t
= 1 i f t he t th per son i sn' t a col l ege gr aduat e
= 0 ot her wi se.
Not e t hat D2t
= 1 - D1t
Cons i der t he f ol l owi ng t wo model s whi ch can be used t o s t udy t he
i mpact of a col l ege degr ee on annual sal ar y.
Model 1:
Yt
= α1
+ α2
D1t
+ εt
I I I B 2Model 2:
Yt
= β1
D1t
+ β2
D2t
+ εt
.
The coef f i ci ent s i n t he t wo r epr esent at i ons have di f f er ent
i nt er pr et at i ons as summar i zed i n t he f ol l owi ng t abl e.
E( Yt
)
E( Yt
Model 1
α1 + α2
Model 2
β1
E( Yt
α1 β2
I n t he model wi t h one f ewer dummy var i abl es t han cat egor i es
( model 1; cat egor i es = col l ege gr aduat e, not a col l ege gr aduat e)
t he coef f i ci ent of t he bi nar y var i abl e r epr esent s t he expect ed
di f f er ence or di f f er ent i al bet ween t he i ncome l evel s associ at ed
wi t h s t at e of t he i ncl uded dummy var i abl e and t he s t at e ( bench
mar k) associ at ed wi t h t he del et ed dummy var i abl e, i . e. ,
α2
= E( Yt
gr aduat e) - E( Yt
not a col l ege gr aduat e)
The coef f i ci ent s i n t he r epr esent at i on whi ch i ncl udes t he
same number of bi nar y var i abl es as cat egor i es ( model 2) r epr esent
t he expect ed i ncome l evel associ at ed wi t h each cat egor y.
b. Es t i mat i on:
Assume t hat we have a t ot al of n obser vat i ons wi t h t he
f i r s t n1
( n1
+ n2
= n) havi ng col l ege degr ees . The t wo
di f f er ent model s can be wr i t t en i n mat r i x not at i on as
Model 1:
I I I B 3
ε
ε
ε
α
α
n
2
1
2
1
n
2
1
+
01
01
11
11
=
Y
Y
Y
M
MM
MM
M
or Y = X α + ε
Model 2:
ε
ε
ε
β
β
n
2
1
2
1
n
2
1
+
10
10
01
01
=
Y
Y
Y
M
MM
MM
M
or Y = X*β + ε .
The l eas t squar es es t i mat or s of t he vect or s α and β ar e
gi ven by
α = ( X' X)- 1
X' Y
α
α
ˆ
ˆ =
Y - Y
Y =
2
1
21
2
and
β = ( X*' X*)- 1
X*' Y
β
β
ˆ
ˆ
=
Y
Y =
2
1
2
1
wher e Y1 and Y2 r espect i vel y, denot e t he sampl e mean i ncome
f or t hose havi ng col l ege degr ees and t hose wi t hout a
I I I B 4
degr ee. Not e t hat t hese ar e sampl e es t i mat es ( sampl e means)
of t he popul at i on means .
c. Dummy Var i abl e Tr ap
Cons i der t he model
Yt
= γ1
+ γ2
D1t
+ γ3
D2t
+ εt
or i n mat r i x f or m
, +
101
101
011
011
=
Y
Y
Y
n
2
1
3
2
1
n
2
1
ε
ε
ε
γ
γ
γ
M
MMM
MMM
M
Y = X**γ + ε
The l eas t squar es es t i mat or s of γ, i f t hey exi s t , ar e gi ven
by
γ = ( X**' X**)- 1
X**' Y.
Not e t hat
1 1 0
1 1 01 1 1
. . .X**'X** = 1 1 1 0 . 0
. 1 00 0 0 1 . 1
1 0 1
1 0 1
K
K
K
;
n0n
0nn
nnn
=
22
11
21
I I I B 5hence, t he f i r s t col umn i s equal t o t he sum of t he second
and t hi r d col umns and
X**' X** = 0.
Ther ef or e, ( X**' X**)- 1
and t he vect or γ i s not def i ned.
Not e t hat t hi s pr obl em coul d be det ect ed by not i ng t hat t he
f i r s t col umn i n X** i s equal t o t he sum of t he second and
t hi r d col umns .
The dummy var i abl e t r ap cor r esponds t o i ncl udi ng an
i nt er cept i n a model i n whi ch t he same number of dummy
var i abl es have been i ncl uded as cat egor i es f or t he
qual i t at i ve char act er i s t i c. The dummy var i abl e t r ap can be
t hought of as r esul t i ng I per f ect mul t i col l i near i t y.
Two appr oaches t o avoi di ng t he dummy var i abl e t r ap ar e
:
( 1) use an i nt er cept and one f ewer dummy var i abl e
t han cat egor i es or
( 2) i ncl ude t he same number of dummy var i abl es as
cat egor i es ( wi t h onl y one char act er i s t i c) , but
del et i ng t he i nt er cept .
I I I B 6d. Gener al i zat i ons
Ther e ar e numer ous ways i n whi ch dummy var i abl es can be
advant ageous l y used i n f or mul at i ng economet r i c model s .
Sever al qual i t at i ve char act er i s t i cs can be model ed i n t he
same equat i on wi t h or wi t hout quant i t at i ve var i abl es . I f
sever al qual i t at i ve char act er i s t i cs ar e t o be i ncl uded i n a
model as expl anat or y var i abl es , an i nt er cept and one f ewer
dummy var i abl es t han cat egor i es shoul d be i ncl uded f or each
qual i t at i ve char act er i s t i c. I nt er act i on t er ms ( pr oduct s of
bi nar y var i abl es ) can be i ncl uded. The dependent var i abl e
can be chosen t o be a bi nar y var i abl e i n appl i cat i ons such
as sel ect i ng good l oan appl i cant s or i n det er mi ni ng whi ch
i ncome t ax r et ur ns t o audi t . Al t er nat i ve appr oaches t o
us i ng dummy var i abl es as dependent var i abl es ar e avai l abl e
and a f ew wi l l be di scussed i n Sect i on 2 ( I I I . B. 2) .
e. Some exampl es and pr ecaut i onar y comment s
( 1) Consumpt i on behavi or i n war t i me ( or ot her uni que t i me
per i ods)
Def i ne Zt
= 1 i f t cor r esponds t o war t i me and 0
ot her wi se.
I ndi cat e how t o model each of t he f ol l owi ng
s i t uat i ons .
( 1) ( 2)
( 3)
β2
β1
β1
β2
β1
I I I B 7
wher e Ct
and Yt
denot e consumpt i on and i ncome i n
per i od t . Case ( 1) cor r esponds t o a model wi t h di f f er ent s l opes and a common s l ope, ( 2) a common i nt er cept and di f f er ent s l opes , and ( 3) t he poss i bi l i t y of di f f er ent i nt er cept s and s l opes .
I t can be shown t hat us i ng dummy var i abl es t o
es t i mat e t he i nt er cept ( s ) and s l ope( s ) i s mor e
ef f i ci ent t han r unni ng separ at e r egr ess i ons i n
cases ( 1) and ( 2) but i s equi val ent t o r unni ng
separ at e r egr ess i ons f or case 3.
( 2) I nt er act i on Ter ms
The use of bi nar y var i abl es i n r egr ess i on model s
t akes account of "addi t i ve" ef f ect s . For
exampl e, cons i der t he model
Sal ar y = β1
+ β2
( i ncome) + β3
( gender ) + β
4( r ace)
wher e
Gender = 1 f emal e = 0 ot her wi se
Race = 1 mi nor i t y
= 0 ot her wi se.
3 4 and β β , r espect i vel y, measur e t he addi t i ve
i mpact on sal ar i es of bei ng a woman and a member of
a mi nor i t y. I f t he dat a sugges t t hat t her e i s an
ext r a i mpact ( pos i t i ve or negat i ve) of bei ng a
woman and a mi nor i t y, t hi s can be model ed us i ng an
I I I B 8i nt er act i on t er m Z = ( Gender ) ( Race) by es t i mat i ng
t he model
Sal ar y = β1
+ β2
( i ncome) + β3
( Gender ) + β4
( Race)
+ β5
Z.
β5
coul d be t es t ed f or s t at i s t i cal l y
s i gni f i cance. A s i mi l ar appr oach coul d be t aken t o
al l ow gender , r ace, and i nt er act i on ef f ect s t o i mpact
t he s l ope.
( 3) The Rat chet t Ef f ect
Thi s exampl e does not use dummy var i abl es , but i l l us t r at es
how i magi nat i ve use of dat a can be pr of i t abl y ut i l i zed. Let
Yt
* = hi ghes t i ncome l evel exper i enced. Cons i der t he
f ol l owi ng f i gur es .
I I I B 9
The consumpt i on f unct i on depi ct ed i n t he f i r s t f i gur e can be
es t i mat ed f r om t he f ol l owi ng equat i on
Ct
= βYt
+ γ( Y*t
- Yt
) .
Not e t hat f or per i ods i n whi ch t her e i s "gr owt h" ( not j us t
r ecover y) Yt
= Yt *and C
t = βY
t and dur i ng a r ecess i on or
associ at ed r ecover y Y*t
i s f i xed and i s gr eat er t han Yt
and Ct
= γYy
* + ( β - γ) Yt
. I n or der t o t es t t o see i f aggr egat e
behavi or al di f f er ences exi s t dur i ng gr owt h per i ods as compar ed
wi t h r ecess i on or r ecover y per i ods t he hypot hes i s H0
: γ = 0
coul d be t es t ed.
( 4) A Pr ecaut i onar y Not e
I I I B 10Cons i der t he pr obl em of model i ng t he i mpact of educat i on
upon sal ar y wher e educat i on f or each i ndi vi dual i s r epor t ed as
bei ng ( a) hi gh school ( HS) or l ess , ( b) havi ng at t ended
col l ege ( BS) , ( c) Mas t er ' s degr ee ( MS) , or ( d) havi ng a Ph. D.
( PhD) .
The l evel of educat i on mi ght be measur ed i n sever al ways .
Thr ee of whi ch mi ght be ( E1, E2 or E3) :
E1
E2
E3
HS
1
12
Number of Year s At t endi ng School
BS
2
16
MS
3
18
PhD
4
20
E1 ass i gns an i ndex t o t he cat egor i es ( assumi ng a monot oni c
r el at i onshi p) , E2 i s a r ough measur e of t he number of year s
of school , and E3 assumes a l i near r el at i onshi p bet ween t he
dependent var i abl e and t he number of year s of school .
Al t er nat i vel y, bi nar y var i abl es coul d be used whi ch al l ow
di f f er ent i at ed i mpact s f or di f f er ent degr ees . To expl or e
t hi s appr oach f ur t her , l et
D1 = 1 HS
= 0 Ot her wi se D2 = 1 BS
= 0 Ot her wi se D3 = 1 MS
= 0 Ot her wi se D4 = 1 PhD
= 0 Ot her wi se
I I I B 11Now cons i der t he f our model s f or r el at i ng sal ar y t o t he
l evel of educat i on:
Model 1. St
= α1
+ α2
E1t
+ ξt
Model 2. St
= β1
+ β2
E2t
+ ηt
Model 3. St
= γ1
+ γ2
E3t
+ ψ
Model 4. St
= δ1
+ δ2
D2t
+ δ3
D3t
+ δ4
D4t
+ εt
These f or mul at i ons have ver y di f f er ent i mpl i cat i ons f or t he
es t i mat ed mar gi nal benef i t of obt ai ni ng a hi gher degr ee or an
addi t i onal year of school . These r esul t s ar e summar i zed i n
t he next t abl e.
Mar gi nal Benef i t of an Addi t i onal Degr ee*
Model 1
Model 2
Model 4
BS
α2
4β2
δ2
MS
α2
2β2
δ3- δ2
PhD
α2
2β2
δ4- δ3
*Model t hr ee ass i gns a cons t ant mar gi nal expect ed val ue of γ2
t o each addi t i onal year of school at al l educat i onal l evel s .
Not e t hat onl y model 4 al l ows f or di f f er ent i at ed r et ur ns t o
degr ees . These r et ur ns can even be negat i ve. I f δ2
and δ3
- δ2
ar e
pos i t i ve and δ4
- δ3
i s negat i ve, t hi s sugges t s t hat expect ed
sal ar i es ar e hi gher f or i ndi vi dual s havi ng a BS or MS r at her t han
t he l ower degr ee, but t hat t he expect ed sal ar y f or t hose wi t h PhDs
I I I B 12i s l ower t han sal ar i es of t hose wi t h a MS. Model 1 i mpl i es a
cons t ant mar gi nal benef i t f or at t ai ni ng each addi t i onal degr ee.
Al so not e t hat i n model s 1, 2, and 4 t he mar gi nal benef i t of
addi t i onal year s of school i ng i n each f or mul at i on i s zer o unl ess
t her e i s a change i n gr oup member shi p ( addi t i onal degr ee i s
ear ned) .
The f or mul at i on associ at ed wi t h Model 1 i mpl i es t hat t he
mar gi nal benef i t i s l i near i n t he educat i on var i abl e. The
es t i mat es al so depend upon how t he gr oups ar e number ed. For
exampl e, i f t he var i abl e has been def i ned as
E1*
HS 1
PhD 2
BS 3
MS 4
Thi s woul d sugges t t hat t he mar gi nal benef i t of a Ph. D. over
havi ng not gone pas t hi gh school i s t he same as t he expect ed
benef i t of havi ng an MS degr ee i ns t ead of s t oppi ng at a BS
degr ee.
We need t o be ver y car ef ul about t he i mpl i cat i ons of t he adopt ed
speci f i cat i on. Some r epr esent at i ons of t he i mpact of mar i t al
s t at us on dependent var i abl es ar e subj ect t o t he pr evi ous l y
ment i oned i s sues . I nt r oduci ng di f f er ent bi nar y var i abl es f or
di f f er ent cat egor i es al l ows t he gr eat es t f l exi bi l i t y. We may al so
want t o al l ow f or nonl i near r el at i onshi ps bet ween var i abl es such
as weal t h, r egr ess i ng per sonal i ncome or weal t h on age and ( age) 2
t o t ake account of a l i f e cycl e ef f ect .
I I I B 13
2. Models with binary dependent variables or limited dependent variables
a. I nt r oduct i on
Cons i der model s i n whi ch one mi ght want t o expl ai n
( 1) when t her e wi l l be a def aul t on a l oan ( Y = 1) or no def aul t
( Y = 0)
( 2) whet her a t ax r et ur n has been f i l ed by someone who has
mi s r epr esent ed t hei r f i nanci al pos i t i on ( Y = 1) or accur at el y
r ef l ect s t he s i t uat i on ( Y = 0)
( 3) The mar ket shar e of a f i r m ( 0 ≤ Y ≤ 1)
These ar e known as l i mi t ed dependent var i abl e pr obl ems .
Amemi ya ( 1981) has an excel l ent sur vey paper i n t he Jour nal of
Economi c Li t er at ur e.
I n each case t he dependent var i abl e ( Y) i n t he f unct i on
Y = f ( X; β) + ε
i s cons t r ai ned i n val ue.
Numer ous appr oaches have been adopt ed f or t hi s pr obl em and
t hese i ncl ude r egr ess i on anal ys i s , l i near pr obabi l i t y model s ,
di scr i mi nant anal ys i s , and l i mi t ed dependent model s .
b. Li near Pr obabi l i t y Model ( LPM)
Let yt
= α + βXt
+ εt
yt
= 1 i f f i r s t opt i on chosen
0 ot her wi se
xt
vect or of val ues of at t r i but es
( i ndependent var i abl e( s ) )
εt
i ndependent l y di s t r i but ed r andom var i abl e
wi t h a zer o mean
Implications of the LPM:
• E( yt
) = Xtβ
Now l et Pt
= Pr ob( yt
= 1)
I I I B 14Q
t = 1 - P
t = Pr ob( y
t = 0)
so t hat
E( yt
) = 1 • Pr ob( yt
= 1) + 0 • Pr ob( yt
= 0)
= 1 • Pt
+ 0 • Qt
= Pt
Thus t he r egr ess i on equat i on descr i bes t he pr obabi l i t y t hat t he
f i r s t choi ce i s made. The vect or β measur es t he ef f ect of a uni t
change i n t he expl anat or y var i abl es on t he pr obabi l i t y of choos i ng
t he f i r s t al t er nat i ve. OLS can be used t o es t i mat e t he LPM;
however , t her e i s some ques t i on about t he appr opr i at eness of OLS
i n t hi s model . To appr eci at e t he r easons f or t hi s concer n, not e
t he f ol l owi ng:
εt
= yt
- Xtβ
• Si nce y can onl y assume t he val ues of 0 or 1, εt
can’t be
di s t r i but ed nor mal l y.
Fur t her , E( εt
) = Pt
( 1 - Xtβ) + ( 1 - P
t) ( - X
tβ) and i f
E( εt
) = 0 t hi s i mpl i es
Pt
= Xtβ and
( 1 - Pt
) = 1 - Xtβ.
Now t o f i nd t he var i ance of t he er r or t er m εt
• Var ( εt
) = E( ε 2
t ) = ( 1 - Xtβ) 2 P
t + ( - X
tβ) 2( 1 - P
t)
I I I B 15 = ( 1 - X
tβ) 2( X
tβ) + ( X
tβ) 2( 1 - X
tβ)
= ( 1 - Xtβ) ( X
tβ)
whi ch shows t hat t he variance of the error depends on the
independent variables and, by definition, is heteroskedastic. One
poss i bl e sol ut i on t o t hi s pr obl em i s t o use wei ght ed l eas t
squar es .
• Anot her pr obl em wi t h t he LPM i s t hat of pr edi ct i on:
Not e t hat wi t h t he l i near pr obabi l i t y model t her e i s a chance
t hat pr edi ct ed val ues f or yt
may l i e out s i de t he i nt er val [ 0, 1] .
One poss i bl e sol ut i on i s t o set al l pr edi ct i ons gr eat er t han 1
equal t o 1 and al l pr edi ct i ons l ess t han 0 equal t o zer o.
However , t hese obser vat i ons pr esent a pr obl em i n r unni ng wei ght ed
l eas t squar es .
I I I B 16c. Qual i t at i ve Response Model s
( 1) I nt r oduct i on
Anot her poss i bi l i t y f or bi nar y or l i mi t ed dependent var i abl es
i s t o use cons t r ai ned es t i mat i on. Di scr i mi nant anal ys i s i s s t i l l
anot her appr oach. Si nce obser ved val ues f or Yt
ar e cons t r ai ned t o
t he i nt er val ( 0, 1) , f unct i onal f or ms F( Xt) whi ch ar e cons t r ai ned
t o t he i nt er val ( 0, 1) can be sel ect ed. Thi s qui t e nat ur al l y
sugges t s us i ng cumul at i ve pr obabi l i t y di s t r i but i ons f or F( Xt) .
F( Xt
) = Pt
Thi s poss i bi l i t y admi t s many al t er nat i ve model s :
( ) tX
t tt = Pr Y 1 X F( ; ) = f(s; ) dsP Xβ
−∞= = β θ θ∫
wher e f ( s ; θ) denot es a "wel l behaved" pr obabi l i t y dens i t y f unct i on
wi t h di s t r i but i onal par amet er s θ. F( Xtβ; θ) i s t he cor r espondi ng
cumul at i ve di s t r i but i on f unct i on eval uat ed at Xtβ, whi ch i s
somet i mes r ef er r ed t o as t he scor e . Two model s whi ch have been
wi del y used ar e t he s t andar d nor mal and l ogi s t i c model s :
I I I B 17
f ( s ; θ)
z
-F(z) = f(s; ) ds
∞θ∫
Nor mal
π2
e2/s-
2
π∫ ∞
2
e2/s-
z
-
2
Logi s t i c
)e+(1
e2s-
s-
-z
1
1 + e
These t wo di s t r i but i ons ar e onl y t wo of many whi ch coul d have been
used, but cur r ent l y domi nat e t hi s l i t er at ur e and ar e r espect i vel y
known as pr obi t ( based on t he nor mal ) and l ogi t ( based on t he l og
l ogi s t i c) model s .
( 2) Es t i mat i on
The es t i mat i on of l i mi t ed dependent model s depends upon t he
model or dens i t y sel ect ed and t he nat ur e of t he dat a.
( a) Yt
= 0 or 1 and ( b) 0 < Yt
< 1.
I f we have dat a based on di scr et e choi ces , t hen we have t he case
(a) Yt = 0 or 1.
The l i kel i hood f unct i on i n t hi s case i s gi ven by
tt
n1-YY
t t t
t=1
L( , ; ) = (1 - )Y P Pβ θ Π
t t
n1-Y Y
t t
t 1
= F( ; (1 - F( ; ) ) )x x=
β θ β θ∏
and t he l og l i kel i hood f unct i on i s
n
t t t t t
t=1
( , ; ) lnF( ; ) + (1 - ) ln(1 - F( ; ) .Y Y x Y xβ θ = β θ β θ∑l
Thi s expr ess i on i s maxi mi zed over t he par amet er s β and θ t o obt ai n
maxi mum l i kel i hood es t i mat or s . Thi s pr ocedur e can be qui t e
I I I B 18
i nvol ved i f t he expr ess i on f or t he cumul at i ve di s t r i but i on i s
compl i cat ed. Recal l t hat
ds)f(x; = )x Pr(z = t),xF( x
-ttt θ∫β≤ββ
∞
wher e θ denot es unknown di s t r i but i onal par amet er s . Any pdf coul d
be sel ect ed i n t he pr evi ous f r amewor k. The pr edi ct ed i mpact of a
change i n t he expl anat or y var i abl es depends on t he pdf as
( )( )
Pr 1t t
i t
it
Y Xf X
Xβ β
∂ ==
∂.
Thus , t he i
β coef f i ci ent s al one do not pr ovi de es t i mat es of t he
mar gi nal i mpact of a change i n t
X on ( )Pr 1t t
Y X= .
I I I B 19
probit Y X1 X2, options
St at a commands f or es t i mat i ng l i mi t ed dependent var i abl es
model s . As not ed ear l i er , t he t wo mos t commonl y used pdf ’s i n
qual i t at i ve r esponse model s ar e t he nor mal and l ogi s t i c
di s t r i but i ons wi t h t he cor r espondi ng qual i t at i ve r esponse model s
bei ng r ef er r ed t o as t he pr obi t and l ogi t model s whi ch can be
es t i mat ed i n mos t common economet r i c sof t war e packages . Some
usef ul St at a commands i n wor ki ng wi t h bi nar y var i abl es ar e gi ven
bel ow:
• To cr eat e dummy var i abl es i n St at a, use t he “gen” command
as f ol l ows: gen dummy_var = exp
wher e exp i s an expr ess i on t hat cat egor i zes t he
dummy_var as a 0 or 1. For exampl e, t o t ake a
cont i nuous var i abl e on i ncome and cr eat e a dummy
var i abl e wher e a 0 r epr esent s “l ess t han $50, 000
annual l y” and a 1 r epr esent s “$50, 000 or mor e
annual l y, ” use t he f ol l owi ng command:
gen income_dummy = income >= 50000
• The pr obi t model can be es t i mat ed us i ng St at a wi t h t he
command
The maxi mum l i kel i hood es t i mat es , of β1
, β2
, β3
and l og
l i kel i hood val ues wi l l be r epor t ed. The mar gi nal i mpact of
changes i n t he expl anat or y var i abl es on t he pr edi ct i ons
( ( )i tf Xβ β ) r at her t han i
β can be obt ai ned by us i ng t he command
I I I B 20
logit Y X1 X2, options
dprobit Y X1 X2, options
A pr edi ct i on mat r i x can be pr i nt ed us i ng t he command:
estat classification, cutoff(#)
The el ement s on t he mai n di agonal ar e t he number of cor r ect
pr edi ct i ons and t he of f di agonal el ement s i ndi cat e t he number of
mi sses .
Obser ved
D
~D
Pr edi ct ed
+
M11
M12
–
M21
M22
The opt i on,
estat classification,cutoff(for example, .5)
speci f i es t he val ue at whi ch an obser vat i on has a pr edi ct ed
pos i t i ve out come. The def aul t cut of f poi nt i s 0. 5.
• Si mi l ar Logi t r esul t s can be obt ai ned us i ng t he command
• Pr edi ct i on mat r i ces f or t he LPM can be obt ai ned as
f ol l ows
r eg y X’s
pr edi ct yhat
gen pr edy = yhat >. 5
t abul at e y pr edy
I I I B 21
( b) Limited dependent variables models where 0 < Yt < 1
I f we have a di scr et e choi ce model wi t h gr ouped dat a or a
model wi t h t he dependent var i abl e s t r i ct l y bet ween 0 and 1,
al t er nat i ve es t i mat i on t echni ques ar e avai l abl e.
One appr oach i s t o use
m
v = p
t
t
t v
t = number choos i ng t he f i r s t r esponse i n t he
t th gr oup
mt
= number i n t he t th gr oup
F- 1
( Pt
) = Xtβ or
F- 1
( Yt
) = Xtβ
I f F i s known, t hen r egr ess i on t echni ques can be empl oyed t o
es t i mat e t he vect or β. Recal l t hat t he pr obi t model i s based
upon t he nor mal cumul at i ve di s t r i but i on f unct i on and
π∫
β
∞2
dse =
)s(-x
-
2/2
t .
The Logi t model i s based upon t he l ogi s t i c di s t r i but i on f unct i on
e + 1
1 = )xF(
tt -x-t εββ
The pr obi t model i nvol ves r at her compl i cat ed es t i mat i on and t her e
i s no compel l i ng r eason t hat t he nor mal shoul d be used. The Logi t
has t hi cker t ai l s , but appr oxi mat es t he pr obi t model .
The Logi t model i s par t i cul ar l y wel l sui t ed f or gr ouped dat a
or ot her s i t uat i ons i n whi ch
0 < Yt
= F( Xt
B) < 1.
Thi s can be seen by sol vi ng
e + 1
1 = )xF(
tt -x-t εββ = Yt
f or t t
X β ε+ whi ch yi el ds
I I I B 22
t-1t t t
t
Y( ) = ln = + F Y x
1 - Yt
Z β ε
=
Regr ess i on t echni ques can be di r ect l y used t o obt ai n es t i mat or s of
β wher e t he dependent var i abl e ( Zt =l n( Yt / ( 1- Yt) ) i s r egr essed on
t he Xt’s . Not e t hat Yt
≠ 0 or 1 i n t hi s r epr esent at i on.
3. PROBLEM SET 4.2
Dummy/Binary variables
Problems 1, 2, 3, 4, and 5 deal wi t h bi nar y i ndependent var i abl es , i ncl udi ng
t he use of i nt er act i on t er ms . Pr obl ems 5 and 6 f ocus on model i ng bi nar y
dependent var i abl es .
Theory
1. Suppose you col l ect dat a f r om a sur vey on wages , educat i on, exper i ence,
and gender . I n addi t i on you ask f or i nf or mat i on about mar i j uana usage.
The or i gi nal ques t i on i s : "On how many occas i ons l as t mont h di d you smoke
mar i j uana?"
a) Wr i t e an equat i on t hat woul d al l ow you t o es t i mat e t he ef f ect s of
mar i j ana usage on wage, whi l e cont r ol l i ng f or ot her f act or s . You
shoul d be abl e t o make s t at ement s such as , "Smoki ng mar i j uana f i ve
mor e t i mes per mont h i s es t i mat ed t o change wage by x%. "
b) Wr i t e a model t hat woul d al l ow you t o t es t whet her dr ug usage has
di f f er ent ef f ect s on wages f or men and women, whi l e cont r ol l i ng f or
ot her var i abl es . How woul d you t es t t hat t her e ar e no di f f er ences i n
t he ef f ect s of dr ug usage f or men and women? You may want t o model
t he i mpact of i nt er act i ons .
c) Suppose you t hi nk i t i s bet t er t o measur e mar i j uana usage by
put t i ng peopl e i nt o one of f our cat egor i es : nonuser , l i ght user ( 1- 5
t i mes per mont h) , moder at e user ( 6- 10 t i mes per mont h) , and heavy
user ( mor e t han 10 t i mes per mont h) . Now wr i t e a model t hat al l ows
you t o es t i mat e t he ef f ect s of mar i j uana usage on wage, whi l e
cont r ol l i ng f or ot her var i abl es and avoi di ng t he dummy var i abl e t r ap.
I I I B 23
d) Usi ng t he model i n par t ( c) , expl ai n i n det ai l how t o t es t t he
nul l hypot hes i s t hat mar i j uana usage has no ef f ect on wage. Be ver y
speci f i c and i ncl ude a car ef ul l i s t i ng of degr ees of f r eedom.
e) What ar e some pot ent i al pr obl ems wi t h dr awi ng causal i nf er ence
us i ng t he sur vey dat a you col l ect ed?
(Wooldridge 7.8)
Applied
2. The f i l e TRAFFI C2. RAW cont ai ns dat a on t r af f i c acci dent s i n Cal i f or ni a
f r om 1981 t o 1989, wi t h each mont h bei ng a separ at e obser vat i on. You
suspect t hat Cal i f or ni a t r af f i c acci dent s ( l i s t ed i n dat a f i l e as
var i abl e totacc) may be cor r el at ed wi t h t he mont h of t he year .
a) Run a r egr ess i on t hat shows t he ef f ect of t he mont h on t he number
of t r af f i c acci dent s . Does i t appear t hat seasonal adj us t ment i s
appr opr i at e when moni t or i ng t he number of Cal i f or ni a t r af f i c
acci dent s? Jus t i f y.
b) You may have not i ced t hat t he dat a di d not i ncl ude t he var i abl e
jan so t hat t he number of dummy var i abl es woul d be one l ess t han t he
number of cl ass i f i cat i ons . I nser t a var i abl e jan. And set jan = 1
f or Januar y obser vat i ons ( i . e. when al l ot her mont h var i abl es equal
zer o) . What es t i mat i on pr obl ems ar e t her e wi t h havi ng t he same
number of dummy var i abl es as cl ass i f i cat i ons? Es t i mat e t hi s
r egr ess i on and compar e your r esul t s wi t h t he r esul t s of par t ( i ) .
( RST)
3. Cons i der t he f ol l owi ng dat a on t he l engt h of empl oyment and associ at ed
sal ar y l evel .
Empl oyee Sal ar y Year s Empl oyed
1 425 1
2 480 3
3 905 20
4 520 5
5 505 4
6 540 15
7 380 6
I I I B 24
8 440 2
9 420 1
10 405 4
11 650 10
The sal ar y f i gur es ar e r evi ewed by empl oyee number s 1 and 7 and t hey
not e t hat empl oyee number s 1, 2, 7, 9, and 10 ar e member s of a mi nor i t y
gr oup and t hey cl ai m t hat t her e i s evi dence of di scr i mi nat i on i n t he
sal ar y s t r uct ur e. Anal yze t hi s asser t i on.
( JM IIIB-4)
I I I B 25
4. Cons i der t he f ol l owi ng model s :
a. ( )( )1 2 3 4Consump Income Wealth Income Wealthα α α α ε= + + + +
wher e Consump denot es consumpt i on expendi t ur es i n dol l ar s and Income
and Wealth ar e measur ed i n dol l ar s .
( 1) Eval uat e t he mar gi nal pr opens i t y t o consume (Consump
Income
∂
∂) .
( 2) What i s t he i nt er pr et at i on of 4α ?
b. 1 2 3 4 5 6( )( )Wage Female Race Female Race Education Experienceβ β β β β β ε= + + + + + +
wher e Wage r epr esent s t he hour l y wage i n dol l ar s , Education measur es
year s of educat i on beyond hi gh school , Experience i s j ob exper i ence
measur ed i n year s , and Female and Race ar e bi nar y var i abl es wi t h Female
=1 f or f emal e empl oyees and Race=1 f or non- whi t e and non- Hi spani c
empl oyees .
( 1) What i s t he i nt er pr et at i on of each of t he f ol l owi ng
par amet er s?
1
2
3
4
5
6
β
β
β
β
β
β
( 2) What j oi nt hypot hes i s coul d be t es t ed t o check f or gender or
r aci al di scr i mi nat i on?
( 3) How coul d t he model be modi f i ed t o al l ow t he poss i bi l i t y of
di f f er ent annual i ncr eases i n t he hour l y wage r at e f or f emal es?
I I I B 26
5. Cons i der t he f ol l owi ng hypot het i cal dat a ( adapt ed f r om Guj ar at i , p. 473) .
The Y i s a bi nar y var i abl e ( Y=1 owns a home, 0 ot her wi se) and X i s f ami l y
i ncome i n t housands of dol l ar s .
Fami l y Y X Fami l y Y X
1 0 8 21 1 22
2 1 16 22 1 16
3 1 18 23 0 12
4 0 11 24 0 11
5 0 12 25 1 16
6 1 19 26 0 11
7 1 20 27 1 20
8 0 13 28 1 18
9 0 9 29 0 11
10 0 10 30 0 10
11 1 17 31 1 17
12 1 18 32 0 13
13 0 14 33 1 21
14 1 20 34 1 20
15 0 6 35 0 11
16 1 19 36 0 8
17 1 16 37 0 17
18 0 10 38 1 16
19 0 8 39 0 7
20 1 18 40 1 17
a. Fi t a l i near pr obabi l i t y model ( LPM)
1 2Y Xβ β ε= + +
t o t he dat a and i nves t i gat e t he pr edi ct i ve abi l i t y of t he
es t i mat ed model .
b. Fi t pr obi t and l ogi t model s t o t hi s same dat a set and compar e t he
pr edi ct i on r esul t s . I ncl ude t he pr edi ct i on mat r i ces .
For pr obi t or l ogi t model s of t he f or m
y = β0 + β1x1 + β2x2 + . . . + βkxk
Stata uses t he commands :
probit y x1 x2 . . . xk
logit y x1 x2 . . . xk
I n or der t o pr i nt t he pr edi ct i on mat r i x us i ng
a . 5 t hr eshol d use t he command
I I I B 27
c. Compar e t he f or ecas t i ng abi l i t y of t he t hr ee model s ( LPM, pr obi t ,
and l ogi t ) cor r espondi ng t o a cut of f val ue of . 3 Use t he command,
estat class, cutoff(.3)
d. Compar e t he mar gi nal i mpact of a change i n i ncome on t he
l i kel i hood of homeonwner shi p us i ng t he t hr ee model s .
6. Let grad be a dummy var i abl e f or whet her a s t udent - at hl et e at a l ar ge
uni ver s i t y gr aduat es i n f i ve year s . Let hsGPA and SAT be hi gh school
gr ade poi nt aver age and SAT scor e, r espect i vel y. Let study be t he number
of hour s spent per week i n an or gani zed s t udy hal l . Suppose t hat , us i ng
dat a on 420 s t udent - at hl et es , t he f ol l owi ng l ogi t model i s obt ai ned:
( )( )ˆ 1 , , 1.17 .24 .00058 .073P grad hsGPA SAT study hsGPA SAT study= = Λ − + + +
wher e ( ) ( )exp( ) /(1 exp( )) tz z z F X βΛ = + = i s t he cdf f or t he l ogi t model .
Hol di ng hsGPA f i xed at 3. 0 and SAT f i xed at 1, 200, comput e t he es t i mat ed
di f f er ence i n t he gr aduat i on pr obabi l i t y f or someone who spent 10 hour s
per week i n s t udy hal l and someone who spent 5 hour s per week.
( Wool dr i dge, 4th edi t i on pr obl em 17. 2)
I I I B 28
I I I . C
1
James B. McDonal d Br i gham Young Uni ver s i t y 7/ 14/ 2009
IV. Miscellaneous Topics
C. Lagged Variables
I ndi vi dual s f r equent l y r espond t o a change i n i ndependent var i abl es
wi t h a t i me l ag. Consequent l y, economi c model s descr i bi ng i ndi vi dual
behavi or as wel l as model s whi ch at t empt t o r epr esent t he r el at i onshi ps
bet ween aggr egat ed var i abl es wi l l of t en i ncl ude l agged i ndependent
var i abl es or l agged dependent var i abl es . We f i r s t cons i der model s whi ch
i ncl ude l agged i ndependent var i abl es ( di s t r i but ed l ag model s ) and t hen
i nves t i gat e model s cont ai ni ng l agged dependent var i abl es ( aut or egr ess i ve
model s ) . Di s t r i but ed l ag and aut or egr ess i ve model s pr ovi de an at t empt t o
model dynami c behavi or .
1. Lagged Independent Variables - Distributed Lag Models
a. Di s t r i but ed l ag model s ar e of t he f or m:
yt = δ + β0xt + β1xt - 1 + . . . + βsxt - s
+ ut
wher e ∂yt/ ∂xt = β0 denot es t he i mmedi at e i mpact of a change i n
x on y, ∂yt/ ∂xt-i = βi denot es t he i mpact of a change i n x on y
af t er i per i ods . Thus , t he βi’s i ndi cat e t he di s t r i but i onal
( over t i me) i mpact of x on y.
( 1) Di s t r i but ed l ag model s can be es t i mat ed us i ng l eas t squar es i f n
( sampl e s i ze) > number of coef f i ci ent par amet er s ( s + 2 = # l ags
+2 ( f or 0 andδ β ) ) and yi el ds BLUE i f ut ~ NI D ( 0, σ2) .
I I I . C
2
( 2) Sever al poss i bl e pr obl ems can ar i se i n di s t r i but ed l ag model s :
( a) how many l ags shoul d be used ( s=?) , ( b) t he degr ees of
f r eedom ( n - k) = n - 2s - 2 may be smal l f or l ar ge l ags ( s ) ,
and ( c) a ser i ous mul t i col l i near i t y pr obl em can ar i se i f t he
x' s ar e s t r ongl y i nt er cor r el at ed wi t h t he cor r espondi ng β i
bei ng ver y er r at i c.
b. Al t er nat i ve Es t i mat i on Pr ocedur es : An al t er nat i ve es t i mat i on
pr ocedur e whi ch has been pr oposed t o "ci r cumvent " t he i mpact of
poss i bl e mul t i col l i near i t y i s t o i mpose some "r easonabl e" pat t er n
t o t he βi' s i n t he es t i mat i on pr ocedur e. I deal l y, t he val i di t y of
t hese hypot hes i zed cons t r ai nt s woul d be t es t ed. Two of t he mos t
commonl y encount er ed pat t er ns f or t he βi' s ar e t he Koyck scheme
and Al mon pol ynomi al wei ght s . The Koyck model assumes t hat t he
βi' s decl i ne geomet r i cal l y and t he Al mon f or mul at i on assumes t hat
t he pat t er ns i n t he βi' s can be model ed by a pol ynomi al i n "i " .
We wi l l f i r s t di scuss t he Koyck model , t hen t he Al mon pr ocedur e,
and t hen cons i der an appl i cat i on of t hese pr ocedur es t o es t i mat i ng
t he r el at i onshi p bet ween sal es and adver t i s i ng expendi t ur e.
( 1) Koyck Scheme
Model : yt = δ + β0xt + β1xt - 1 + . . . + ut
I I I . C
3
Koyck sugges t ed t hat t he βi be appr oxi mat ed by
The Koyck wei ght s ( βi) decl i ne geomet r i cal l y f or 0 < λ < 1.
We now der i ve an equat i on whi ch can be used i n es t i mat i ng t he
Koyck f or mul at i on of di s t r i but ed l ag coef f i ci ent s wi t h
geomet r i cal l y decl i ni ng wei ght s . Thi s der i vat i on i s done i n
t wo ways : ( 1) us i ng a l i near oper at or and ( 2) us i ng al gebr ai c
mani pul at i ons . Let Lxt = xt - 1
, L2xt = xt - 2
, et c.
( 1) Subs t i t ut i ng t he Koyck expr ess i on f or βi i nt o t he di s t r i but ed
l ag model yi el ds i it t0t
i=0
= + ( ) + uy L x∞
δ β λ∑ or
0t tt
= + ( ) + .y ux1 - L
βδ
λ
Mul t i pl yi ng bot h s i des of t hi s equat i on by ( 1 - λL) yi el ds
yt - λyt - 1
= ( 1- λL) yt =( 1 - λ) δ + β0xt + ut - λut-1
yt = δ(1 - λ) + β0xt + λyt-1 + ut - λut-1.
βi = β0λi
βi
I I I . C
4
or
Not e t hat t hi s equat i on can be es t i mat ed by r egr ess i ng yt on xt
and yt-1.
( 2) Anot her way t o der i ve t he es t i mat i ng equat i on f or t he
Koyck di s t r i but ed l ag model wi t hout t he l ag oper at or ( L) i s as
f ol l ows:
Subs t i t ut e βj = β0λj i nt o equat i on f or t he di s t r i but ed l ag
model t o obt ai n
yt = δ + β0xt + β0λxt-1 + β0λ2 xt-2 + . . . + ut.
Now r epl ace t by "t - 1" i n t hi s equat i on and mul t i pl y by λ
λyt-1 = δλ + β0λxt-1 + β0λ2xt-2 +. . . +λut-1.
Subt r act t hese t wo equat i ons t o obt ai n
yt - λyt-1 = δ( 1 - λ) + β0xt + ut - λut-1
wher e vt = ut - λut-1 and t hi s es t i mat i ng equat i on i s t he same
as obt ai ned i n ( 1) .
yt = δ(1 - λ) + β0xt + λyt-1
+ vt
I I I . C
5
Not e: ( a) The assumpt i on of a Koyck wei ght i ng scheme r educes
t he number of par amet er s t o be es t i mat ed t o 3 ( δ, λ, β0) .
( b) I f t he ut' s i n t he or i gi nal model ar e i ndependent l y
di s t r i but ed, t hen t he l as t r epr esent at i on of t he model i s
char act er i zed by aut ocor r el at i on and cont ai ns a l agged
dependent var i abl e whi ch poses speci al es t i mat i on pr obl ems and
wi l l be cons i der ed l at er .
( 2) Al mon Pol ynomi al Di s t r i but ed Lags
The Al mon pol ynomi al di s t r i but ed l ag f or mul at i on i s one of t he
mos t wi del y used i n pr act i ce. We begi n wi t h a model wi t h a
f i ni t e number of l ags :
Model : yt = δ + β0xt + β1xt-1 + . . . + βsxt-s + ut.
The Al mon wei ght i ng Scheme i s def i ned by:
βj = f ( j ) = ao + a1 j + . . . + ap j p j =1, 2, . . . , s
s = # of l ags = # of β' s - 1
p = degr ee of pol ynomi nal .
Pol ynomi al s ar e ext r emel y f l exi bl e and can be used t o
appr oxi mat e any cont i nuous f unct i on as accur at el y as des i r ed
by sel ect i ng p t o be l ar ge enough.
The cor r espondi ng es t i mat i ng equat i on can be obt ai ned by
subs t i t ut i ng f ( j ) f or βj i nt o t he di s t r i but ed l ag model ,
I I I . C
6
col l ect i ng t er ms i nvol vi ng a i' s and t hen es t i mat i ng t he a i' s
us i ng l eas t squar es . Gi ven es t i mat es f or t he a i' s ,
cor r espondi ng es t i mat es of t he βj' s can be obt ai ned f r om t he
es t i mat ed f ( j ) . By us i ng such a speci f i cat i on we ar e
es t i mat i ng ( p + 2) par amet er s ( δ, a0, . . . , ap) r at her t han
( s + 2) par amet er s ( δ, β0, . . . , βs) . I f p ( t he degr ee of
pol ynomi al def i ni ng t he wei ght s ) i s smal l er t han s ( t he
maxi mum l ag) , t hen t he Al mon wei ght i ng scheme r esul t s i n f ewer
par amet er s needi ng t o be es t i mat ed. I n gener al p i s usual l y
sel ect ed t o be r at her smal l ( 2, 3, 4) .
To per f or m t hi s es t i mat i on pr ocedur e i n Stata, gener at e t he
pol ynomi al var i abl es ( t he “z i' s ”) , r un t he r egr ess i on of t he
dependent var i abl e on t he pol ynomi al var i abl es , and t hen
r ecover t he βj' s f r om t he es t i mat i on. For exampl e, t he
f ol l owi ng code wi l l es t i mat e t he pr evi ous model wi t h t hr ee
l ags ( s=3) us i ng a second or der ( p=2) pol ynomi al t o descr i be
t he pat t er ns of t he βi' s :
*generate the polynomial variables
gen z0 = X+X[_n-1]+X[_n-2]+X[_n-3]
gen z1 = X[_n-1]+X[_n-2]*2+X[_n-3]*3
gen z2 = X[_n-1]+X[_n-2]*4+X[_n-3]*9
*regress the Y variable on the polynomial variables
reg Y z0 z1 z2
estat ic
*recover the betas
scalar b0 = _b[z0]
scalar b1 = _b[z0]+_b[z1]+_b[z2]
scalar b2 = _b[z0]+_b[z1]*2+_b[z2]*4
scalar b3 = _b[z0]+_b[z1]*3+_b[z2]*9
*display the betas
display b0, b1, b2, b3
The mat hemat i cal det ai l s behi nd t hese t r ans f or mat i ons ar e
i l l us t r at ed i n t he f i r s t sect i on of t he appendi x. Thi s
es t i mat i on pr ocedur e i s automated by such pr ogr ams as SAS and
SHAZAM. For exampl e t he SHAZAM command t o es t i mat e t he
I I I . C
7
pr evi ous model wi t h t hr ee l ags ( s=3) us i ng a second or der
( p=2) pol ynomi al t o descr i be t he pat t er ns of t he βi' s i s gi ven
by:
OLS Y X(0.3,2)
Thi s command wi l l not onl y es t i mat e t he a i' s , but wi l l al so
gener at e t he β i' s . However , many cal cul at i ons ar e goi ng on
i n t he backgr ound. The r el at ed det ai l s and di s t r i but i onal
det ai l s ar e summar i zed i n t he appendi x "A Few Det ai l s f or t he
Al mon Di s t r i but ed Lag. "
Examples:
The Al mon es t i mat or s have a smal l er var i ance t han t he l eas t
squar es es t i mat or , whet her t he assumpt i on of a pol ynomi al l ag
i s val i d or not . I f t he assumpt i on i s i ncor r ect t he Al mon
es t i mat or i s bi ased and i ncons i s t ent [ cf . Schmi dt & Si ckl es ,
I ER ( Oct ober 1975) ; Schmi dt & War d, JASA ( Mar ch 1973) ] .
TESTING t he Al mon scheme
Ho: βj = f ( j ) = ao + a1 j + . . . + ap j p j =1, 2, . . . , s
can be per f or med us i ng LR or Chow t es t s t o compar e t he Al mon
and OLS r esul t s .
I I I . C
8
c. A Revi ew and Appl i cat i on of Di s t r i but ed Lag Model s t o Es t i mat i ng
t he Rel at i onshi p Bet ween Sal es and Adver t i s i ng
I n many s i t uat i ons t he economi c agent s whose behavi or i s bei ng
model ed don' t r eact i mmedi at el y or compl et el y t o changes i n t he
economi c envi r onment . I ns t ead, t he adj us t ment may be gr adual and
t ake pl ace over sever al per i ods of t i me. The del ay may be due t o
habi t per s i s t ence, t he cos t of f r equent changes , t he del ay i n
gat her i ng dat a or ot her t echnol ogi cal , i ns t i t ut i onal or behavi or al
f act or s . Wel l - known exampl es woul d i ncl ude t he r esponse of such
macr oeconomi c var i abl es as GDP or pr i ces t o unexpect ed changes i n
t he money suppl y, gover nment spendi ng or t he t ax sys t em.
Adver t i s i ng has al so been shown t o have an i mpact on sal es whi ch
gener al l y l as t s f or mor e t han one per i od of t i me.
Di s t r i but ed l ag model s pr ovi de a conveni ent descr i pt i ve model
of s i t uat i ons i n whi ch changes i n an i ndependent var i abl e may have
an i mpact whi ch l as t s f or sever al t i me per i ods .
A s i mpl e exampl e of such a model i s gi ven by
St = δ + β0At + β1At-1 + β2At-2 + . . . + βkAt-k + εt
wher e St and At r epr esent sal es and adver t i s i ng expendi t ur e dur i ng
t he t th t i me per i od. I n t hi s model "δ" r epr esent s t he l evel of
sal es whi ch woul d t ake pl ace wi t hout any adver t i s i ng. The i mpact
of adver t i s i ng can be r eadi l y det er mi ned. An i ncr ease i n
adver t i s i ng of one uni t woul d be expect ed t o i ncr ease sal es by β0
dur i ng t he same per i od. Sal es i n t he next per i od woul d i ncr ease
I I I . C
9
by β1 uni t s . Si mi l ar l y, t he i mpact on sal es af t er k t i me per i ods
i s gi ven by βk.
I I I . C
10
The "di s t r i but ed l ag" ef f ect of adver t i s i ng on sal es mi ght be
vi sual l y r epr esent ed as f ol l ows:
Fi gur e 2
Di s t r i but ed l ag coef f i ci ent s
Thi s f i gur e cor r esponds t o t he case i n whi ch i ncr eased adver t i s i ng
has an i mmedi at e i mpact on sal es , t he i mpact i ncr eases f or t wo
per i ods , t hen decl i nes and t hen t her e i s no i mpact af t er f our
per i ods . An al t er nat i ve scenar i o mi ght be wher e adver t i s i ng has
t he gr eat es t i mpact on sal es i n t he same t i me per i od, f ol l owed by
a gr adual l y decl i ni ng i mpact . Thi s coul d be r epr esent ed i n Fi gur e
3.
βi
βi
I I I . C
11
Fi gur e 3
Decl i ni ng di s t r i but ed l ag coef f i ci ent s
Di s t r i but ed l ag model s ar e ext r emel y f l exi bl e i n t er ms of
admi ss i bl e behavi or . However , t hi s f l exi bi l i t y can l ead t o
es t i mat i on pr obl ems . I n pr i nci pl e, l eas t squar es es t i mat es of t he
coef f i ci ent s ar e t he mi ni mum var i ance es t i mat or s of al l unbi ased
es t i mat or s of t he coef f i ci ent s i n di s t r i but ed l ag model s under t he
s t andar d assumpt i ons associ at ed wi t h t he model .
I n pr act i ce, sever al di f f i cul t i es ar e encount er ed. I n or der
t o i l l us t r at e t hese pr obl ems , assume t hat mont hl y obser vat i ons on
sal es and adver t i s i ng f or t hr ee year s ar e avai l abl e. I n or der t o
es t i mat e t he di s t r i but ed i mpact of adver t i s i ng on sal es , we mi ght
cons i der es t i mat i ng t he model :
St = δ + β0At + β1At-1 + . . . + β12At-12 + εt.
Thi s speci f i cat i on cont ai ns 14 unknown par amet er s ( coef f i ci ent s )
and r equi r es obser vat i ons on each of t he var i abl es , i . e. , St, At,
At-1, . . . , At-12. These dat a ar e r epor t ed i n t he Tabl e i n t he
Appendi x l abel ed "Sal es and Adver t i s i ng Dat a. " I n or der t o have
an obser vat i on f or each var i abl e i ncl udi ng At-12, t he f i r s t t wel ve
obser vat i onal val ues on sal es mus t be del et ed wi t h t he f i r s t
useabl e t i me per i od cor r espondi ng t o t =13. Hence, t he useabl e
sampl e s i ze i s r educed f r om 36 t o 24 by t he i ncl us i on of t he 12
l agged var i abl es f or adver t i s i ng. The degr ees of f r eedom
associ at ed wi t h t hi s model ar e 10 ( useabl e sampl e s i ze - number of
coef f i ci ent s t o be es t i mat ed) . I n f act i f 17 l ags had been
I I I . C
12
i ncl uded, t he useabl e sampl e s i ze woul d be equal t o t he number of
coef f i ci ent s t o be es t i mat ed and t he degr ees of f r eedom woul d be
zer o.
Anot her pr obl em ar i ses when t he expl anat or y var i abl e i s
associ at ed wi t h a t r end over t i me. I f t he t r end i s appr oxi mat el y
l i near , t hen mul t i col l i near i t y bet ween t he cur r ent and l agged
val ues of t he expl anat or y var i abl es may make i t di f f i cul t t o
accur at el y es t i mat e i ndi vi dual par amet er coef f i ci ent s . The
pai r wi se cor r el at i ons of l agged adver t i s i ng ar e gi ven i n t he
f ol l owi ng t abl e:
Tabl e 2
Pai r wi se Cor r el at i ons of Lagged Adver t i s i ng
A A( - 1) A( - 2) A( - 3) A( - 12)
A 1 . 874 . 866 . 859 . . .
. 892
A( - 1) 1 . 874 . 855 . . . .
896
A( - 2) 1 . 863 . . . . 839
A( - 3) 1 .
. .
. .
. .
A( - 12) 1
Each of t hese s i t uat i ons ( l ow degr ees of f r eedom and
mul t i col l i near i t y) can r esul t i n unr el i abl e es t i mat es of t he di s t r i but ed l ag
coef f i ci ent s ( βi) .
OLS estimation (demonstration using Stata):
I I I . C
13
As a case i n poi nt , i f we r egr ess sal es on adver t i s i ng expendi t ur e
f or t he cur r ent and pr evi ous t wel ve mont hs us i ng t he command:
. t s set t
. r eg S A A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 or
. r eg S A A1- A12
. es t at i c “r epor t s t he cor r espondi ng l og- l i kel i hood val ue”
wher e each of t he AJ have been gener at ed by addi ng an “L” i n f r ont
of t he var i abl e
. gen A1 = l . A
. gen A2 = l . A1
…
. gen A12 = l . A11
We t hen obt ai n
Source | SS df MS Number of obs = 24
-------------+------------------------------ F( 13, 10) = 3.51
Model | 8029.73337 13 617.671797 Prob > F = 0.0268
Residual | 1760.76663 10 176.076663 R-squared = 0.8202
-------------+------------------------------ Adj R-squared = 0.5864
Total | 9790.5 23 425.673913 Root MSE = 13.269
------------------------------------------------------------------------------
S | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
A | .4270829 .2063794 2.07 0.065 -.0327592 .8869249
A1 | .0015484 .2161103 0.01 0.994 -.4799754 .4830721
A2 | .1026181 .1849852 0.55 0.591 -.3095545 .5147907
A3 | .1387561 .1593701 0.87 0.404 -.2163427 .4938549
A4 | -.0324424 .1771302 -0.18 0.858 -.427113 .3622282
A5 | -.0431555 .1744989 -0.25 0.810 -.4319632 .3456522
A6 | .2148685 .1721424 1.25 0.240 -.1686887 .5984256
A7 | .114542 .1544704 0.74 0.475 -.2296396 .4587236
A8 | -.1045846 .1490156 -0.70 0.499 -.436612 .2274427
A9 | -.2443856 .1460974 -1.67 0.125 -.5699108 .0811397
A10 | -.1016249 .173713 -0.59 0.572 -.4886817 .2854318
A11 | -.0571411 .2020959 -0.28 0.783 -.5074388 .3931567
A12 | .0085637 .20028 0.04 0.967 -.4376881 .4548154
_cons | 478.7293 18.94364 25.27 0.000 436.5202 520.9383
------------------------------------------------------------------------------
Log-likelihood value = -85.6
Not e: l ags can al so be
cr eat ed i n STAT us i ng t he
command:
. gen A1 = A[ _n- 1]
I I I . C
14
The f ol l owi ng f i gur e shows t he cor r espondi ng OLS es t i mat es of t he
βi
Fi gur e 4
Di s t r i but ed Lag Coef f i ci ent s
( No Cons t r ai nt s )
The es t i mat or vol at i l i t y, l ar ge s t andar d er r or s and smal l t - s t at i s t i cs
f or t he es t i mat ed OLS β' s sugges t a mul t i col l i near i t y pr obl em.
Nei t her t he pat t er n or s i gns f or t he βi' s ar e cons i s t ent wi t h a
r easonabl e expl anat i on of t he i mpact of adver t i s i ng on sal es .
The mos t common appr oach f or deal i ng wi t h t hese pr obl ems i s t o
assume t hat t he βi' s f ol l ow a "r easonabl e" pat t er n whi ch i s descr i bed
by a f ewer number of par amet er s . The associ at ed model i s es t i mat ed
and used i n anal yzi ng t he i mpact of t he var i abl e i n ques t i on.
Cl ear l y, t he advant ages of t hi s appr oach ar e condi t i onal upon t he
accur acy of t he assumpt i ons made about t he βi' s and t hese assumpt i ons
shoul d be t es t ed. The Koyck di s t r i but ed l ag and pol ynomi al
di s t r i but ed l ag model s wi l l be appl i ed.
KOYCK DI STRI BUTED LAGS:
βi
0. 1
0. 2
I I I . C
15
I f t he model bui l der i s wi l l i ng t o assume t hat t he i mpact of t he
i ndependent var i abl e ( adver t i s i ng) on t he dependent var i abl e ( sal es )
decl i nes geomet r i cal l y over t i me, t he Koyck model can pr ovi de a
r easonabl e poss i bi l i t y. I n t hi s model t he coef f i ci ent s ar e assumed t o
be of t he f or m
βi = λi βo i = 1, 2, . . .
Thi s can be vi sual l y r epr esent ed ( f or t wo di f f er ent val ues of λ) as
βi
0. 5
i
λ = 0. 6
λ = 0. 9
I I I . C
16
St = a(1 - λ) + β0At + λSt-1 + εt - λεt-1
The Koyck assumpt i on i mpl i es t hat
2, 1, = i = A
Si
it-
t β∂
∂
= λi βo,
i . e. , a change of one uni t of adver t i s i ng wi l l have an i mmedi at e
i mpact ( β0) on sal es and wi l l cont i nue t o af f ect sal es t her eaf t er , but
at an exponent i al l y decl i ni ng r at e. I n ot her wor ds , sal es wi l l be
i nf l uenced by not onl y cur r ent adver t i s i ng, but al l pas t val ues of
adver t i s i ng.
Rewr i t i ng t he di s t r i but ed l ag model and subs t i t ut i ng f or t he Koyck
coef f i ci ent s yi el ds
St = a + β0At + β1At-1 + β2At-2 + . . . + εt
= a + β0At + λβ1At-1 + λ2β2At-2 + . . . + εt.
Not i ce t hat by assumi ng t hat t he coef f i ci ent s f ol l ow a Koyck model ,
onl y t hr ee coef f i ci ent s ( a, β0 and λ) need be es t i mat ed. Thi s
r epr esent at i on can be wr i t t en i n a f or m whi ch f aci l i t at es es t i mat i on
by r epl aci ng t by t - 1, and mul t i pl yi ng by λ t o yi el d:
( ORI GI NAL) St = a + β0At + λβ0At-1 + λ2β0At-2 + . . . + εt
( MODI FI ED) λSt - 1
= aλ + λβ0At-1 + λ2β0At-2 + . . . + ε
t - 1.
Subt r act i ng t he "modi f i ed r epr esent at i on" f r om t he "or i gi nal
r epr esent at i on" yi el ds
St - λSt-1 = a - aλ + β0At + εt - λεt-1
or equi val ent l y,
.
I I I . C
17
Thi s i s t he f or m we have pr evi ous l y di scussed whi ch can be es t i mat ed
us i ng l eas t squar es wi t h t he St at a commands
Wi t h t he f ol l owi ng St at a out put :
Source | SS df MS Number of obs = 35
-------------+------------------------------ F( 2, 32) = 77.63
Model | 21128.4531 2 10564.2265 Prob > F = 0.0000
Residual | 4354.68977 32 136.084055 R-squared = 0.8291
-------------+------------------------------ Adj R-squared = 0.8184
Total | 25483.1429 34 749.504202 Root MSE = 11.666
------------------------------------------------------------------------------
S | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
A | .3732443 .0621284 6.01 0.000 .2466929 .4997957
S1 | .1628443 .128893 1.26 0.216 -.0997022 .4253907
_cons | 407.1455 63.71989 6.39 0.000 277.3523 536.9387
------------------------------------------------------------------------------
( The es t i mat ed i nt er cept i n t hi s model cor r esponds t o a ( 1 - λ ) ;
hence,
a = 407. 145/ ( 1 - . 1628) = 486. 32
The di s t r i but ed l ag coef f i ci ent s can be eas i l y r ecover ed f r om t he
equat i on
β i = β 0 λ i
= ( . 3732) ( . 1628) i;
t her ef or e, t he i mmedi at e i mpact of a one dol l ar i ncr ease i n
adver t i s i ng i s es t i mat ed t o be β 0 = . 3732 wi t h subsequent i ncr eases
i n sal es es t i mat ed t o be ( . 0608, . 0099, . 0016, . 003, 0) f or t he f i r s t
tsset t
gen S1 = S[_n-1])
reg S A S1
I I I . C
18
t hr ough t he f i f t h per i ods . The l ong r un i mpact of a one dol l ar
i ncr ease i n adver t i s i ng i s obt ai ned f r om t he f ol l owi ng:
i mmedi at e: β 0
+ l ag one per i od: β 0 λ
+ l ag t wo per i ods : β 0 λ 2
M cont i nue
Tot al Long Run I mpact
β 0/ ( 1 - λ ) = . 446
Sever al comment s need t o be made. Fi r s t , i t i s ver y i mpor t ant t o t es t
f or aut ocor r el at i on. The l eas t squar es es t i mat or s wi l l be bi ased and
i ncons i s t ent i f t he model cont ai ns l agged dependent var i abl es and
aut ocor r el at ed r andom di s t ur bances . Es t i mat i on t echni ques have been
devel oped whi ch yi el d cons i s t ent es t i mat or s i n t hi s case, but wi l l not
be di scussed her e. Leas t squar es appl i ed t o an equat i on wi t h a l agged
dependent var i abl e and uncor r el at ed er r or s wi l l yi el d bi ased, but
cons i s t ent es t i mat or s . Secondl y, i f i t i s f el t t hat t he assumpt i on t hat
t he i mpact of t he i ndependent var i abl e begi ns decl i ni ng i mmedi at el y i s
t oo r es t r i ct i ve, t hi s can be r el axed. The Koyck pr ocedur e can be
modi f i ed t o cor r espond t o decl i ni ng wei ght s af t er an ar bi t r ar y
t r ans i t i on per i od.
I I I . C
19
POLYNOMI AL DI STRI BUTED LAGS:
As i ndi cat ed ear l i er , pol ynomi al di s t r i but ed l ag model s pr ovi de
one of t he mos t common appr oaches t o di s t r i but ed l ag model s . The
bas i c i dea i s t o appr oxi mat e t he des i r ed f or m f or t he βi' s wi t h a
pol ynomi al whi ch i s descr i bed by a f ewer number of par amet er s t han
t he or i gi nal βi' s i n t he model . I n pr act i ce, p i s r ar el y chosen
t o be l ar ger t han t wo or t hr ee, i . e. , t he βi' s f ol l ow a quadr at i c
or cubi c f or m. As an exampl e, i f p = 2, t he βi' s ar e compl et el y
descr i bed by t hr ee par amet er s ( a0, a1, a2) i n t he equat i on:
βi = a0 + a1i + a2i2.
Consequent l y, t he model
St = a + β0At + β1At-1 + β2At-2 + . . . + βsAt-s + εt
onl y i nvol ves t he par amet er s ( a, a0, a1, a2) r egar dl ess of t he
number of l ags ( s ) i ncl uded i n t he equat i on. Once t he a0, a1, a2
ar e es t i mat ed, t he cor r espondi ng es t i mat es of βi can be obt ai ned
f r om
βi = a0 + a1i + a2i2,
i . e. ,
β0 = a0
β1 = a0 + a1 + a2
β2 = a0 + 2a1 + 4a2 , et c.
I I I . C
20
Al so not e t hat speci f yi ng t he βi' s t o be quadr at i c al l ows
cons i der abl e f l exi bi l i t y.
βi βi
βi βi
Fi gur e 5. Quadr at i c Di s t r i but ed Lags
Stata Example
As an exampl e of es t i mat i ng pol ynomi al di s t r i but ed l ag
coef f i ci ent s , we es t i mat e t he di s t r i but ed l ag i mpact of
adver t i s i ng on sal es us i ng pol ynomi al di s t r i but ed l ags wi t h t he
f ol l owi ng St at a commands ( wher e s=12 and p= 2) :
gen z0 = A + A[_n-1]+A[_n-2]+A[_n-3]+…+A[_n-12]
*index `i' should range up to the order of the polynomial (p)
forvalues i= 1/2
gen z`i' = A[_n-1]+A[_n-2]*2^`i'+A[_n-3]*3^`i' …+A[_n-
12]*12^`i’
*regress s on the p+1 transformed variables
reg S z0 z1 z2
*Recover the betas from the coefficients of the zi’s
*(beta0 will be the same as a0, the coefficient of z0)
Scalar b0=_b[z0]
Display b0
forvalues i=1/12
scalar b`i' = _b[z0]+_b[z1]*`i'+_b[z2]*`i'^2
display "beta"
display b0
display b`i'
I I I . C
21
. reg s z0 z1 z2
Source SS df MS Number of obs = 24
------------------------------------------ F( 3, 20) = 14.37
Model 6688.46 3 2229.49 Prob > F = 0.0000
Residual 3102.04 20 155.10 R-squared = 0.6832
------------------------------------------- Adj R-squared = 0.6356
Total 9790.5 23 425.67 Root MSE = 12.454
------------------------------------------------------------------------------
s Coef. Std. Err. t P>|t|
-----------------------------------------------------------------------------
z0 .2366588 .1137905 2.08 0.051
z1 -.0611558 .0432326 -1.41 0.173
z2 .0032403 .0032659 0.99 0.333
_cons | 484.40 15.95 30.36 0.000
------------------------------------------------------------------------------
. estat ic
-----------------------------------------------------------------------------
Model | Obs ll(null) ll(model) df AIC BIC
----------------------------------------------------------------------------
. | 24 -106.1879 -92.39565 4 192.7913 197.5035
The polynomial distributed lag coefficients can then be obtained from the equation
βi = a0 + a1i + a2i2
=. 2366 - . 0612 i + . 0032 i 2.
The r esul t i ng coef f i ci ent s ar e gi ven bel ow:
βi
0 .237
1 .179
2 .127
3 .082
4 -.044
5 -.012
6 .013
7 -.033
8 -.045
9 -.051
I I I . C
22
10 -.051
11
12
-.044
-.031
The βi' s ( pol ynomi al di s t r i but ed l ag model ) can be i l l us t r at ed as i n
Fi g. 6
βi
. 3
. 2
. 1
0 1 2 3 4 5
Fi gur e 6. Pol ynomi al Di s t r i but ed Lag Coef f i ci ent
The r esul t s f r om t hese t hr ee t echni ques ( OLS, Koyck, PDL) ar e summar i zed
i n Fi gur e 7.
βi
. 4
. 3
. 2
. 1
1 2 3 4 5 6 7 i
Fi gur e 7. Al t er nat i ve Es t i mat es of Di s t r i but ed Lag Ef f ect s
Not e t hat t he di s t r i but ed l ag coef f i ci ent s associ at ed wi t h t he
Koyck and pol ynomi al model s decl i ne- - at di f f er ent r at es . The
pol ynomi al di s t r i but ed l ag model sugges t s t hat t he i mpact of
adver t i s i ng i sn' t s t at i s t i cal l y s i gni f i cant beyond t hr ee or f our
OLS di s t r i but ed
l ag
Koyck di s t r i but ed l ag
pol ynomi al di s t r i but ed l ag
I I I . C
23
mont hs . The es t i mat ed wei ght s f r om t he Koyck model "di e out " even
mor e qui ckl y. Thi s i s i n shar p cont r as t t o t he wei ght s whi ch wer e
es t i mat ed wi t hout any cons t r ai nt s ( OLS) . The advant age of t he
al t er nat i ves t o uncons t r ai ned es t i mat i on shoul d be appar ent . The
r el at ed l i t er at ur e cont ai ns a di scuss i on of many al t er nat i ves . The
met hodol ogy i s s i mi l ar t o t hat al r eady di scussed: ( 1) speci f y a
"f or m f or t he βi' s " whi ch r educes t he number of par amet er s t o be
es t i mat ed; ( 2) t hese new par amet er s ar e t hen es t i mat ed and t he
cor r espondi ng β' s obt ai ned.
The r eader may want t o gai n exper i ence by es t i mat i ng some
al t er nat i ve speci f i cat i ons . I t woul d be i ns t r uct i ve t o cons i der t he
sens i t i vi t y of pol ynomi al di s t r i but ed l ag βi' s t o t he number of l ags ,
degr ee of under l yi ng pol ynomi al as wel l as assumpt i ons about end
poi nt s . The r eader mi ght al so demons t r at e t hat i f we assume t he
ef f ect of adver t i s i ng doesn' t begi n t o decay exponent i al l y unt i l
per i od t wo ( r at her t han i n t he f i r s t per i od) , t he r el evant model can
be wr i t t en as
St = a( 1 - λ) + λSt-1 + β0At + ( β1 - λβ0) At-1 + εt - λεt-1
wher e βi = λi-1 β1 f or i = 1, 2, . . . Es t i mat e t hi s model and compar e
t he r esul t s wi t h t hose obt ai ned us i ng t he Koyck model . The consistency
of the polynomial distributed lag model specification with the unconstrained
estimates can be easily tested using a likelihood ratio test.
I I I . C
24
2. Lagged Dependent Variables - Autoregressive model
Aut or egr ess i ve model s i ncl ude l agged val ues of dependent var i abl es ,
can be vi ewed as bei ng dynami c model s , and l i nk di f f er ent t i me
per i ods . We f i r s t i nt er pr et and summar i ze t he s t at i s t i cal pr oper t i es
of OLS es t i mat or s of aut or egr ess i ve model s . The coef f i ci ent s i n t hese
model s have i mpor t ant "dynami c" i nt er pr et at i ons concer ni ng compar at i ve
s t at i c r esul t s . Fi nal l y, we show t hat t he f amous par t i al and adapt i ve
expect at i ons model s can be expr essed as aut or egr ess i ve model s .
a. I nt er pr et i ng t he coef f i ci ent s i n aut or egr ess i ve model s . A model
i s sai d t o be dynami c i f val ues of t he dependent var i abl e f r om t he
cur r ent and pr evi ous t i me per i ods ar e i ncl uded i n t he same
equat i on. The i ncl us i on of l agged dependent var i abl es pr esent s
sever al pr obl ems t o t he economet r i ci an. I n or der t o di scuss some
of t hese pr obl ems , cons i der t he f ol l owi ng aut or egr ess i ve model :
Yt = α + βI t + γYt-1 + εt
wher e Yt and I t denot e some aggr egat e measur es of pr oduct i on and
i nves t ment .
( 1) Pr oper t i es of es t i mat or s and s t at i s t i cal i nf er ence
I f t he εt' s ar e i ndependent of each ot her ( i . e. , A. 4) , t hen
l eas t squar es es t i mat or s of α, β, γ, ( αs
, β , γ ) wi l l be
bi ased, but cons i s t ent ; wher eas , i f t he εt ar e ser i al l y
cor r el at ed, αs
, β , γ wi l l be bi ased and i ncons i s t ent . I n
nei t her case wi l l t he t and F s t at i s t i cs be appr opr i at e ( mor e
I I I . C
25
on t hi s i n anot her sect i on) . The pr oper t i es of l eas t squar es
es t i mat or s can be compact l y summar i zed as i n t he f ol l owi ng
t abl e:
Pr oper t i es of Leas t Squar es
Res i dual s
Uncor r el at ed Cor r el at ed
No Lagged Dependent
Var i abl e
unbi ased
cons i s t ent
ef f i ci ent
unbi ased
cons i s t ent
not ef f i ci ent Lagged Dependent
Var i abl e
bi ased
cons i s t ent
not ef f i ci ent
bi ased
i ncons i s t ent
not ef f i ci ent
Thus i t i s i mpor t ant t o t es t f or aut ocor r el at i on. The D. W.
can be used f or model s wi t hout l agged dependent var i abl es and
Dur bi n' s h t es t or Br eusch- Godf r ey t es t can be used f or
aut or egr ess i ve model s . ( See t he di scuss i on of aut ocor r el at i on
i n sect i on I V of t he not es . )
( 2) I nt er pr et at i on of coef f i ci ent s
For not at i onal s i mpl i ci t y del et e εt f r om t he pr evi ous equat i on
and cons i der
Yt = α + βI t + γYt-1
i s r ef er r ed t o as t he i mpact mul t i pl i er
f or t hi s model and i s not what i s
gener al l y r ef er r ed t o as "t he i nves t ment
mul t i pl i er . " The i mpact mul t i pl i er
β∂
∂ =
I
Y
t
t
I I I . C
26
measur es t he change i n Yt dur i ng t he same
per i od as I t changes .
We not e t hat s i nce
Yt = α + βI t + γYt-1
i t f ol l ows t hat
Yt-1 = α + βI t-1 + γYt-2;
hence,
Yt = α + βI t + γ( α + βI t-1 + γYt-2)
= α( 1 + γ) + β[ I t + γI t-1] + γ2Yt-2.
Cont i nui ng t hi s pr ocess we obt ai n
Yt = α( 1 + γ + γ2 + . . . ) + β[ I t + γI t-1 +γ2I t-2 + . . . ] .
What wi l l t he t ot al ef f ect of a change i n I t have on Yt, Yt+1, .
. . ,
when ∆I t = 1 ∆Yt = β
∆Yt+1 =βγ
∆Yt+2 = βγ2
M
Tot al i mpact =γ
βγγβ
- 1 = ...) + + + (1
2
The t wo per i od cumul at i ve mul t i pl i er i s gi ven by β + βγ,
t he t hr ee per i od by β + βγ + βγ2 and so on.
The l ong r un i nves t ment mul t i pl i er i s gi ven byγ
β
- 1 . The
l ong- r un mul t i pl i er can be i nt er pr et ed i n t wo ways : ( 1) t he
cumul at i ve ( over t i me) change i n Y cor r espondi ng t o a one t i me
= ... + I ... + I + I + I + - 1
3t-3
2t-2
1t-t γβγββγβγ
α
I I I . C
27
i ncr ease i n i nves t ment expendi t ur e; or ( 2) t he i ncr ease i n
l ong- r un equi l i br i um Y cor r espondi ng t o a sus t ai ned i ncr ease
i n i nves t ment expendi t ur e. These t wo i nt er pr et at i ons ar e
r epr esent ed i n t he f ol l owi ng f i gur e.
I I I . C
28
I mpact of change i n i nves t ment
One per i od change Sus t ai ned change
Yt
Yt
I
I
t t
b. Some common aut or egr ess i ve model s
( 1) Par t i al adj us t ment model
Opt i mal : The opt i mal val ue of yt, yt*, i s a f unct i on of xt
yt* = α + βxt tu+
Adj us t ment mechani sm:
yt - yt-1 = γ( yt* - yt-1) 0 < γ ≤ 1
Not e: ( 1) γ = 1 cor r esponds t o compl et e adj us t ment .
( 2) Thi s adj us t ment mechani sm i s cons i s t ent wi t h t he
mi ni mi zat i on of cos t s , c t, wher e
c t = α( yt - yt*) 2 + β( yt - yt-1)2
cos t s : out of equi l i br i um change
wher e yt-1 and yt* ar e gi ven.
∆I =1 ∆I =1
tY1
β∆ =
− γ
tY1
β∆ =
− γ
I I I . C
29
yt = αγ + βγxt + (1 - γ)yt-1 + γt
u
Combi ni ng t he bas i c equat i on and adj us t ment mechani sm yi el ds
whi ch can be es t i mat ed us i ng OLS.
( 2) Adapt i ve Expect at i ons Model . Thi s model r el axes t he
assumpt i on t hat t he dependent var i abl e depends onl y on t he
cur r ent l evel of t he i ndependent var i abl e. Let xt* denot e t he
"expect ed" l evel of xt and assume t he dependent var i abl e
i mmedi at el y adj us t s t o xt*.
Bas i c Rel at i onshi p:
yt = α + β xt* + ut
Adj us t ment Mechani sm:
xt* - xt-1
* = δ( xt - xt-1*) 0 < δ ≤ 1
δ = 1 cor r esponds t o compl et e adj us t ment .
Combi ni ng t hese expr ess i ons yi el ds
Not e t he s i mi l ar i t y and di f f er ences bet ween t he f or ms f or t he
Koyck, par t i al adj us t ment , and adapt i ve expect at i ons model s .
yt = αδ + βδxt + (1 - δ)yt-1 + (ut - (1 - δ)ut-1)
I I I . C
30
( 3) Par t i al Adj us t ment and Adapt i ve Expect at i ons Model
Bas i c Rel at i onshi p: yt* = α + β xt
*
opt i mal expect ed
Adj us t ment Mechani sms:
yt - yt-1 = γ( yt* - yt-1) + ut 0 < γ ≤ 1
xt* - xt-1
* = δ( xt - xt-1*) 0 < δ ≤ 1
Combi ni ng t hese expr ess i ons yi el ds
c. Es t i mat i on of Aut or egr ess i ve model s
Cons i der t he model
yt = β1 + β2yt-1 + β3xt + εt
wi t h t he f ol l owi ng assumpt i ons f or t he er r or t er m.
Assumpt i on I . εt ~ NI D( 0, σ2) wher e NI D s t ands f or
i ndependent l y and i dent i cal l y di s t r i but ed as
N( 0, σ2) .
Assumpt i on I I . εt = ut - λut-1 Koyck
a. ut ~ NI D ( 0, σ2u)
b. ut = ρut-1 + ηt ρ < 1
ηt ~ NI D( 0, σ2η)
Assumpt i on I I I . εt = ρεt-1 + ut ut ~ NI D( 0, σ2u)
yt = αγδ + βγδxt + [(1 - δ) + (1 - γ)]yt-1
- (1 - δ)(1 - γ)yt-2 + (ut - (1 - δ)ut-1)
I I I . C
31
( 1) Assumpt i on I . l eas t squar e es t i mat or s of β = ( β1, β2, β3)
wi l l be bi ased, but cons i s t ent .
( a) Remember t hat OLS es t i mat or s ar e unbi ased and cons i s t ent
i n t he pr esence of aut ocor r el at i on, but ar e no l onger
mi ni mum var i ance es t i mat or s .
( b) The pr esence of l agged dependent var i abl es r esul t s i n
l eas t squar es es t i mat or s whi ch ar e bi ased, but ar e s t i l l
cons i s t ent .
( c) The pr esence of aut ocor r el at i on and l agged dependent
var i abl es i mpl i es t hat l eas t squar es es t i mat or s wi l l be
bi ased and i ncons i s t ent . Thi s s i t uat i on ar i ses wi t h
assumpt i on I I and I I I . Hence, es t i mat or s ot her t han l eas t
squar es es t i mat or s need t o be devel oped f or t he case of
l agged dependent var i abl es and aut ocor r el at i on.
( d) The i ncl us i on of l agged dependent var i abl es bi ases t he
val ue of t he Dur bi n Wat son s t at i s t i c t owar ds 2 and
t her ef or e t he s t andar d i nt er pr et at i on of D. W. i s not
val i d.
The h- t es t has been pr oposed as a t es t f or aut ocor r el a-
t i on i n t hi s case
ρ
)y of .est .(Coefar Vn - 1
n =h
1t-
2
1
The asympt ot i c di s t r i but i on of h i s
h ~ N( 0, 1) .
Ther e ar e t wo mai n pr obl ems wi t h t hi s t es t :
( i ) The h t es t i s not val i d i f n V ar ( ) > 1
( i i ) N( 0, 1) seems t o be a yi el d a poor f i t t o t he
di s t r i but i on of h f or f r equent l y encount er ed sampl e
s i zes . Some have ar gued t hat t he use of du and 4- du
t o def i ne cr i t i cal r egi ons appear s t o pr ovi de mor e
accur at e r esul t s . Du cor r esponds t o t he upper l i mi t
I I I . C
32
( ) 1ttt211tt Y1CC −− ε−ε+γβ+λ−β=λ−
f or a Dur bi n Wat son Tes t St at i s t i c whi ch wi l l be
di scussed l at t er .
_______________________________________
du 2 4- du
Ot her t es t s f or t he pr esence of aut ocor r el at i on i n a model wi t h
l agged dependent var i abl es ar e avai l abl e. For exampl e, t he
Br eusch- Godf r ey and Lj ung- Box t es t s can be modi f i ed t o appl y t o
aut or egr ess i ve model s . The Br eusch- Godf r ey t es t can be appl i ed by
r egr ess i ng t he OLS t' on the lagged y's and the lagged e 't
e s s i mpl i ed by
t he model ( aut or egr ess i ve and number of aut or egr ess i on or movi ng
aver age er r or s ) and t es t i ng f or t he col l ect i ve expl anat or y power
of t he coef f i ci ent s of t he l agged er r or s us i ng an F- t es t .
A br i ef t r eat ment of es t i mat i on i n t he case of I I or I I I i s
r epor t ed i n t he appendi x.
I I I . C
33
D. Causality or Exogeniety
The exi s t ence of a r el at i onshi p does not i mpl y t hat ei t her var i abl e
causes t he ot her var i abl e. Ther e i s an ext ens i ve l i t er at ur e on what i t
means f or X t o cause Y or f or X t o be exogenous t o Y. A r el at ed concept
i s Gr anger causal i t y. X i s sai d t o not Gr anger - cause Y i f t he
condi t i onal di s t r i but i on of Y, gi ven l agged Y and l agged X i s equal t o
t he condi t i onal di s t r i but i on of Y, gi ven l agged Y. Al t er nat i vel y,
l agged X’s do not hel p expl ai n cur r ent l evel s of Y. A t es t of whet her X
Gr anger - causes Y can be per f or med as f ol l ows:
( 1) Es t i mat e t he f ol l owi ng model :
1 1 1 1... ...t t p t p t p t p t
y a b y b y c x c x ε− − − −= + + + + + + + .
( 2) Tes t t he j oi nt hypot hes i s , 0 1: ... 0p
H c c= = = ( X does not
Gr anger - cause Y) us i ng an F t es t . A “l ar ge” F s t at i s t i c pr ovi des
evi dence t hat X Gr anger - causes Y.
I I I . C
34
APPENDIX-- PDL MODELS
1. "A Few Details for the Almon Distributed Lag."
Cons i der t he pr obl em of es t i mat i ng an Al mon di s t r i but ed l ag model wi t h p =
2 and s = 3 so we have a 2nd degr ee pol ynomi al wi t h 3 l ags . The βi' s can be
expr essed i n t er ms of t he a i' s ( r ecal l : βj = a0 + a1i + a2i2) as
β0 = a0
β1 = a0 + a1 + a2
β2 = a0 + 2a1 + 4a2
β3 = a0 + 3a1 + 9a2 .
Subs t i t ut i ng t hese expr ess i ons i nt o t he or i gi nal di s t r i but ed l ag model f or βi
yi el ds :
yt = α + a0xt + ( a0 + a1 + a2) xt-1 + ( a0 + 2a1 + 4a2) xt-2 + ( a0 +
3a1 + 9a2) xt-3
= α + a0( xt + xt-1 + xt-2 + xt-3)
+ a1( xt-1 + 2xt-2 + 3xt-3)
+ a2( xt-1 + 4xt-2 + 9xt-3) + ut
For a mor e gener al case, assume p = 3 and s = 10.
s = 10: yt = δ + βoxt + β1xt-1 + . . + β10xt-10 + ut
p = 3: βi = a0 + a1i + a2i2 + a3i
3
β0 = a0
β1 = a0 + a1 + a2 + a3 = Σa i
β2 = a0 + a12 + a222 + a32
3 = Σa i2i
M β10 = a0 + a110 + a2102 + a3103 = Σa i10i
Agai n, af t er subs t i t ut i ng f or βi, we obt ai n
yt = δ + a0xt + ( Σa i) xt-1 + ( Σa i2i) xt-2 + . . .
+ ( Σa i10i) xt-10 + ut.
I I I . C
35
Rear r angi ng t er ms we obt ai n
yt = δ + a0( xt + xt-1 + . . . + xt-10)
+ a1( xt-1 + 2xt-2 + . . . + 10xt-10)
+ a2( xt-1 + 22xt-2 + . . . + 102xt-10)
+ a3( xt-1 + 23xt-2 + . . + 103xt-10) +ut
δ ∑∑ ix a + x a + = y it-
10
1=i
1it-
10
0=i
0t
u + xi a + xi a + tit-3
10
1=i
3it-2
10
0=i
2
∑∑
Def i ni ng )xi( = z it-j
10
0=i
tj ∑ we can es t i mat e t he a i, ( t he βi) by obt ai ni ng es t i mat es of
yt = δ + a0z t0 + a1z t1 + a2z t2 + a3z t3 + ut
)ZZ( =
a
.
.
.
a
ˆ
Var1-2
3
0
u ′σ
δ
Now s i nce
δ
β
β
β
δ
a
a
a
a
101010100
.....
.....
.....
33330
22220
11110
00001
=
.
.
.
3
2
1
0
3210
3210
3210
10
1
0
I I I . C
36
δ
a
.
.
.
a
C =
3
0
C )ZZC( =
ˆ
.
.
.
ˆ
ˆ
ˆ
Var then 1-2
10
1
0
u ′′σ
β
β
β
δ
I I I . C
37
PROBLEM SET 4.3: LAGGED VARIABLES
Applied problems
1. Repl i cat e t he r esul t s i n t he appl i cat i ons of OLS, Koyck, and PDL model s
t o es t i mat e t he r el at i onshi p bet ween sal es and adver t i s i ng expendi t ur es r epor t ed i n not es . The dat a ar e avai l abl e i n f i l e hw3_3_table1.txt).
I n par t i cul ar ,
( a) es t i mat e
St = a + β0At + . . . +β0At-12+ εt
us i ng ( 1) OLS
( 2) Koyck Lags ( r epor t λ, α, β0) ( 3) Pol ynomi al di s t r i but ed l ags , or der = 2
( b) Compar e t he di s t r i but ed l ag coef f i ci ent s wi t h OLS.
( c) Tes t t he PDL speci f i cat i on agai ns t t he OLS us i ng a Chow and LR
t es t .
( d) Re- es t i mat e t he model us i ng a pol ynomi al di s t r i but ed l ag wi t h
or der = 3 and t es t whet her t he di f f er ences bet ween p=2 and p=3 ar e
s t at i s t i cal l y s i gni f i cant .
( e) ( Bonus) Es t i mat e a modi f i ed Koyck model whi ch decl i nes
geomet r i cal l y af t er t he f i r s t l ag.
Hi nt : r epl i cat e t he commands cont ai ned i n t he PDL sect i on of t he cl ass not es .
The TA wi l l be a gr eat r esour ce.
I I I . C
38
(JM III-C)
Table 1
Sales and Advertising
t St At At-1 At-2 At-3 At-4 At-12
1 521 73
2 515 94 73
3 533 88 94 73
4 531 103 88 94 73
5 544 104 103 88 94 73
6 528 73 104 103 88 94
7 537 121 73 104 103 88
8 541 134 121 73 104 103
9 531 102 134 121 73 104
10 535 79 102 134 121 73
11 527 119 79 102 134 121
12 517 118 119 79 102 134
13 547 145 118 119 79 102 73
14 560 128 145 118 119 79 94
15 557 145 128 145 118 119 88
16 548 191 145 128 145 118 103
17 543 159 191 145 128 145 104
18 580 169 159 191 145 128 73
19 564 162 169 159 191 145 121
20 581 181 162 169 159 191 134
21 557 170 181 162 169 159 102
22 575 183 170 181 162 169 79
23 585 205 183 170 181 162 119
24 568 185 205 183 170 181 118
25 569 200 185 205 183 170 145
26 551 173 200 185 205 183 128
27 586 243 173 200 185 205 145
28 581 215 243 173 200 185 191
29 559 210 215 243 173 200 159
30 594 229 210 215 243 173 169
31 593 227 229 210 215 243 162
32 579 249 227 229 210 215 181
33 609 265 249 227 229 210 170
34 602 257 265 249 227 229 183
35 617 253 257 265 249 227 205
36 601 239 253 257 265 249 185
I I I . C
39
2. I n Exampl e 11. 4 ( Wooldridge p.389) i t may be expect ed t hat t he expect ed val ue of t he r et ur n at t i me t, i t a quadr at i c f unct i on of returnt-1. To check t hi s poss i bi l i t y, use t he dat a i n NYSE.RAW t o es t i mat e
returnt = β0 + β1returnt-1 + β2return
2t-1 + u
( a) r epor t t he r esul t s i n s t andar d f or m ( b) St at e and t es t t he nul l hypot hes i s t hat E( returnt|returnt-1) does not
depend on returnt-1. ( Hi nt : Ther e ar e t wo r es t r i ct i ons t o t es t her e. ) ( c) Dr op return
2t-1 f r om t he model , but add t he i nt er act i on t er m
returnt-1returnt-2. Now t es t t he ef f i ci ent mar ket s hypot hes i s ( β1= β2 = 0) . ( d) What do you concl ude about pr edi ct i ng weekl y s t ock r et ur ns based on
pas t s t ock r et ur ns? (Wooldridge C. 11.3)
1I V
James B. McDonal d Br i gham Young Uni ver s i t y 7/ 12/ 2010
V. Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
A. Introductory Comments, B. Nonnormality of errors, C. Nonzero mean of errors, D.
Generalized Regression Model, E. Heteroskedasticity, F. Autocorrelation, G. Panel Data, H.
Stochastic X’s, I. Measurement Error, J. Specification Error
A. Introductory Comments
The Cl ass i cal Nor mal Li near Regr ess i on Model i s def i ned by:
y = Xβ + ε
wher e ( A. 1) ε i s di s t r i but ed nor mal l y
( A. 2) E( εt
) = 0 f or al l t
( A. 3) Var ( εt
) = σ2 f or al l t
( A. 4) Cov ( εtεs
) = 0 f or t ≠ s
( A. 5) The X' s ar e nons t ochas t i c and( )
n
XXlimn
′
∞→ i s nons i ngul ar ,
ΣX.
Recal l t hat assumpt i ons ( A. 1) - ( A. 4) can be wr i t t en mor e compact l y as
ε ~ N[ 0, Σ = σ2I ] .
I n sect i on ( I I ' ) we demons t r at ed t hat under assumpt i ons ( A. 1) - ( A. 5) t he
l eas t squar es es t i mat or ( β ) , t he maxi mum l i kel i hood es t i mat or (∆
β ) , and
t he bes t l i near unbi ased es t i mat or ( β~
) ar e i dent i cal , i . e. ,
β = β~
=∆
β = ( X' X)- 1
X' y and
β ~ N[ β; σ2( X' X)- 1
] .
Addi t i onal l y, we pr oved t hat t he l eas t squar es es t i mat or β ( hence β~
and∆
β ) ar e
•unbi ased es t i mat or s
2I V
•mi ni mum var i ance of al l unbi ased es t i mat or s
3I V
•cons i s t ent
•asympt ot i cal l y ef f i ci ent .
I n t hi s sect i on we wi l l demons t r at e t hat t he s t at i s t i cal pr oper t i es of
β ar e cr uci al l y dependent upon t he val i di t y of assumpt i ons ( A. 1) - ( A. 5) .
The associ at ed di scuss i on wi l l pr oceed by dr oppi ng one assumpt i on at a
t i me and cons i der i ng t he consequences . Fi r s t , we wi l l dr op ( A. 1) and t hen
( A. 2) . Thi s wi l l be f ol l owed by cons i der i ng t he gener al i zed r egr ess i on
model whi ch can be vi ewed as a gener al i zed model whi ch i ncl udes
het er oskedas t i ci t y ( vi ol at i on of ( A. 3) ) , aut ocor r el at i on ( vi ol at i on of
( A. 4) ) , and t he cl ass i cal nor mal l i near r egr ess i on model as speci al cases .
I n Sect i ons G, H, and I we wi l l cons i der t he i mpl i cat i ons of vi ol at i ng
( A. 5) , t he exi s t ence of measur ement er r or , and pr esence of speci f i cat i on
er r or ( guess i ng t he wr ong model ) .
B. The Random Disturbances are not distributed normally, but (A.2)-(A.5) are valid.
An i nspect i on of t he der i vat i on of t he l eas t squar es es t i mat or β
r eveal s t hat t he deduct i on i s i ndependent of any of t he assumpt i ons
( A. 1) - ( A. 5) ; hence,
β = ( X' X)- 1
X' y
i s s t i l l t he cor r ect f or mul a f or t he l eas t squar es es t i mat or of β i n t he
model
y= Xβ + ε
r egar dl ess of t he assumpt i ons about t he di s t r i but i on of ε. However , i t
shoul d be ment i oned t hat t he s t at i s t i cal pr oper t i es of β ar e ver y
sens i t i ve t o t he assumpt i ons about t he di s t r i but i on of ε.
Si mi l ar l y, we not e t hat t he BLUE of β i s i nvar i ant wi t h r espect t o
t he assumpt i ons about t he under l yi ng pr obabi l i t y dens i t y f unct i on of ε as
l ong as ( A. 2) - ( A. 5) ar e val i d. I n t hi s case we can concl ude t hat
β = β~
= ( X' X)- 1
X' y
4I V
and bot h β and β~
wi l l be
• unbi ased
• mi ni mum var i ance of al l l i near unbi ased es t i mat or s
( not necessar i l y of al l unbi ased es t i mat or s s i nce t he Cr amer Rao
l ower bound depends upon dens i t y of t he r es i dual s )
• cons i s t ent
• s t andar d t and F t es t s and conf i dence i nt er val s ar e not necessar i l y
val i d f or nonnor mal l y di s t r i but ed r es i dual s .
The di s t r i but i on of β wi l l depend on t he di s t r i but i on of ε whi ch
det er mi nes t he di s t r i but i on of y ( y = Xβ + ε) and t he di s t r i but i on of β
and β~
( β = β~
= ( X' X)- 1
X' y ) .
Let ' s cons i der t he MLE of β. Recal l t hat t he f i r s t s t ep i n t he
der i vat i on of MLE of β i s t o def i ne t he l i kel i hood f unct i on, f or
i ndependent and i dent i cal l y di s t r i but ed obser vat i ons ,
L = f ( y1
; β) . . . f ( yn
; β)
whi ch r equi r es a knowl edge of t he di s t r i but i on of t he r andom di s t ur bances
and coul d not be def i ned ot her wi se. MLE ar e gener al l y ef f i ci ent . Leas t
squar es es t i mat or s wi l l be ef f i ci ent i f f ( y; ) = nor mal . However , l eas t
squar es need not be ef f i ci ent i f t he r es i dual s ar e not di s t r i but ed
nor mal l y. For exampl e, i f ε i s di s t r i but ed as a Lapl ace wi t h A. 2- A. 5
hol di ng, OLS wi l l be cons i s t ent and BLUE, but not ef f i ci ent .
Cons i der t he case i n whi ch t he dens i t y f unct i on of t he r andom
di s t ur bances i s t he Lapl ace or doubl e exponent i al def i ned by
( )-| |/
ef ; - < <
2
ε λ ε σ = ∞ ε ∞
λ
whi ch can be gr aphi cal l y depi ct ed as
5I V
f ( εt
)
Thi s dens i t y has t hi cker t ai l s t han t he nor mal and i s mor e peaked at 0.
The associ at ed l i kel i hood f unct i on i s def i ned by
L = f ( y1; β, λ ) . . . f ( yn
; β, λ )
1 n1 n-| - |/ -| - |/y yX Xe e= . . .
2 2
β λ β λ
λ λ
wher e Xt
= ( 1, xt 2
, . . . , xt k
) , β' = ( β1
, . . . , βk
) . The l og
l i kel i hood f unct i on i s gi ven by
tt1
= lnL = - | - | / - nln(2 ).Xn
t
y β λ λ=
∑l
6I V
The MLE of β i n t hi s case wi l l mi ni mi ze t he sum of t he absol ut e val ue of
t he er r or s
tt
t
| - |Xy β∑
and i s somet i mes cal l ed t he "l eas t l i nes , " mi ni mum absol ut e devi at i ons
( MAD) , l eas t absol ut e devi at i on ( LAD) , or l eas t absol ut e er r or ( LAE)
es t i mat or ; wher eas , t he l eas t squar es es t i mat or of β mi ni mi zes t he sum of
squar ed er r or s
( )2
t t
t
y X β−∑
and wi l l not be t he MLE es t i mat or ∆
β i n t hi s case. For t he l i near
r egr ess i on model wi t h Lapl ace er r or t er ms ∆
β ( LAD) wi l l be unbi ased,
cons i s t ent , and asympt ot i cal l y ef f i ci ent . The f ol l owi ng t abl e compar es
and cont r as t s t he r el at i ve per f or mance of OLS and LAD es t i mat or s f or t he
t wo di f f er ent er r or di s t r i but i ons , t he nor mal and Lapl ace.
Var i ance- covar i ance mat r i ces of t he OLS and LAD es t i mat or s
Es t i mat or \ er r or
di s t r i but i on
Nor mal Lapl ace
OLS ( )12 'X Xσ
− ( )
12 'X Xσ−
LAD ( )122 'X Xσ
−
( )2
1'
2X X
σ −
Fr om t hi s t abl e we can see t hat t he var i ance of LAD es t i mat or s i s t wi ce
t hat of t he cor r espondi ng OLS es t i mat or s f or nor mal er r or s , but i s hal f
t he OLS var i ance f or Lapl ace er r or s . Recal l t hat t he Lapl ace pdf has
t hi cker t ai l s t han t he nor mal ; hence, t he pr esence of out l i er s LAD may be
pr ef er r ed t o OLS. LAD es t i mat or s can be obt ai ned us i ng t he St at a command
qreg y X’s
7I V
The exer ci se set cons i der s a gener al i zed er r or ( GED) di s t r i but i on
whi ch i ncl udes bot h t he nor mal and doubl e exponent i al or Lapl ace as
speci al cases . Consequent l y, l eas t squar es and LAD es t i mat or s ar e speci al
cases of MLE of t he GED di s t r i but i on.
I n t he pas t , t he f unct i onal f or m of t he di s t r i but i on of t he r es i dual s
has r ar el y been i nves t i gat ed. Thi s i s changi ng and coul d be i nves t i gat ed
by compar i ng t he di s t r i but i on of εt
wi t h t he nor mal .
Var i ous t es t s have been pr oposed t o i nves t i gat e t he val i di t y of t he
nor mal i t y assumpt i on. These t es t s t ake di f f er ent f or ms . One cl ass of
t es t s i s based on exami ni ng t he skewness or kur t os i s of t he di s t r i but i on
of t he es t i mat ed r es i dual s .
The skewness coef f i ci ent
3
1 3/ 22
E( ) =
( )
εγ
σ
whi ch can be es t i mat ed by
3
1
1 3/ 2
2
1
/
ˆ
/
n
t
t
n
t
t
n
n
ε
γ
ε
=
=
=
∑
∑
and has an asympt ot i c di s t r i but i on
N( 0, 6/ n) .
Si mi l ar l y, t he excess kur t os i s coef f i ci ent
4
2 22
E( ) = - 3
( )
εγ
σ
can be es t i mat ed by
4t
t
2 22t
t
/e
ˆ - 3( )e /
n
nγ
=∑
∑
8I V
and has an asympt ot i c di s t r i but i on
N( 0, 24/ n)
f or nor mal l y di s t r i but ed r es i dual s . These t wo r esul t s pr ovi de t he bas i s
f or cons t r uct i ng “t - t ype” t es t s t o t es t whet her t he sampl e skewness or
kur t os i s ar e cons i s t ent wi t h t he assumpt i on of nor mal l y di s t r i but ed
r es i dual s .
The Jar que- Ber a t es t pr ovi des a j oi nt t es t of a symmet r i c di s t r i but i on
f or t he r es i dual wi t h kur t os i s of t hr ee. The t es t s t at i s t i c i s def i ned by
( )2
2 excess kurtosisskewnessJB = n +
6 24
and has an asympt ot i c Chi squar e di s t r i but i on wi t h t wo degr ees of f r eedom.
The di s t r i but i on of JB f ol l ows f r om i t bei ng equal t o t he sum of squar es
of t wo asympt ot i cal l y i ndependent s t andar d nor mal var i abl es .
Chi - squar e goodness of f i t t es t s have al so been pr oposed whi ch ar e
based upon compar i ng t he hi s t ogr am of es t i mat ed r es i dual s wi t h t he nor mal
di s t r i but i on.
These t es t s t at i s t i cs and ot her s ar e avai l abl e out put on such pr ogr ams
as St at a, SAS, or SHAZAM. The St at a commands ar e gi ven bel ow.
To t es t f or s t at i s t i cal l y s i gni f i cant depar t ur es of skewness and
kur t os i s f r om t he nor mal , t he commands ar e: reg y X’s
predict resid, res
sum resid, detail
sktest resid
The out put f r om t he sktest e al so i ncl udes t he cal cul at i on of a
Jar que- Ber a- l i ke t es t , al ong wi t h t he associ at ed p- val ues . The
exact t es t s t at i s t i cs di f f er f r om t hose out l i ned above, but ar e
s i mi l ar i n s t r uct ur e and t es t s t he same hypot heses . (D’Agostino, Belander, and D’Agostino, American Statistician, 1990, pp. 316-321)
To per f or m a Chi - squar e t es t i n St at a, you mus t f i r s t i ns t al l t he
“csgof ” command by t ypi ng
findit csgof
and t hen i ns t al l i ng t he command and hel p f i l es .
9I V
The Kol mogor ov- Smi r nov t es t i s based upon t he di s t r i but i on of t he
maxi mum ver t i cal di s t ance bet ween t he cumul at i ve hi s t ogr am and t he
cumul at i ve di s t r i but i on of t he hypot hes i zed di s t r i but i on. James Ramsey' s
pr ogr am SEA ( Speci f i cat i on Er r or Anal ys i s ) enabl es one t o per f or m such a
t es t . Thi s can al so be per f or med i n St at a us i ng t he command “ksmi r nov”.
An al t er nat i ve appr oach i s t o cons i der gener al di s t r i but i on f unct i ons
whi ch i ncl ude many of t he common al t er nat i ve speci f i cat i ons such as t he
nor mal as speci al cases . The f i r s t pr obl em i n t he pr obl em set i l l us t r at es
t hi s appr oach. Fi ve ot her di s t r i but i ons whi ch mi ght al so be cons i der ed
ar e t he gener al i zed t , skewed gener al i zed t , t , EGB2, and I nver se
Hyper bol i c Si ne di s t r i but i ons . Es t i mat i on pr ocedur es exi s t whi ch per f or m
wel l f or non- nor mal di s t r i but i ons . Some of t hese ar e r ef er r ed t o as
r obus t , M, semi par amet r i c, or par t i al l y adapt i ve es t i mat or s whi ch
accommodat e ver y f l exi bl e under l yi ng di s t r i but i ons . Ker nel es t i mat or s
pr ovi de anot her appr oach t o t hi s pr obl em whi ch ar e nonpar amet r i c i n t hat
t hey ar e i ndependent of a di s t r i but i onal assumpt i on. Us i ng some of t hese
al t er nat i ve es t i mat or s , t he hypot hes i s of nor mal l y di s t r i but ed r es i dual s
can al so be t es t ed us i ng t he LR, Wal d, or Rao or Lagr angi an mul t i pl i er
t es t s .
10I V
C. ε ~ N (µ, ΣΣΣΣ = σ2 I), i.e., drop (A.2)
The l eas t squar es es t i mat or s of β i s gi ven by
β = ( X' X)- 1
X' y
The expect ed val ue of β i s gi ven as f ol l ows
E( β ) = ( X' X)- 1
X' E( y)
= ( X' X)- 1
X' ( Xβ + E( ε) )
= ( X' X)- 1
X' Xβ + ( X' X)- 1
X' µ
= β + ( X' X)- 1
X' µ
wi t h t he second t er m r epr esent i ng t he bi as , whi ch appear s t o sugges t t hat
al l of t he l eas t squar es es t i mat or s i n t he vect or β ar e bi ased.
However , i f E( εt
) = µ f or al l t , t hen
1
..
= = . .
..
1
µ
µ µ
µ
and i t can be shown t hat
( X' X)- 1
X' µ = ( X' X)- 1
X'
µ
µ
0
0 =
1
.
.
.
1
11I V
and onl y t he es t i mat or of t he i nt er cept i s bi ased. I f an er r or
di s t r i but i on has a nonzer o mean, t hi s get s i ncl uded i n t he i nt er cept t er m
and separ at e es t i mat es of β1
and µ can' t be obt ai ned.
Mor e gener al vi ol at i ons of ( A. 2) such as a non- zer o, non- cons t ant mean can
l ead t o bi ased es t i mat or s of t he i nt er cept and s l ope coef f i ci ent s .
β1
+ β2
Xt
µ
12I V
D. Generalized Normal Linear Regression Model
1. Introduction
I n many economi c appl i cat i ons ei t her ( A. 3) or ( A. 4) i s vi ol at ed, i . e. ,
Het er oskedas t i ci t y: Var ( εt
) ≠ σ2 f or al l t
Aut ocor r el at i on: Cov ( εt
, εs
) ≠ 0 f or t ≠ s
For s i t uat i ons i n whi ch ei t her or bot h aut ocor r el at i on and
het er oskedas t i ci t y exi s t s
Var ( ε) = Σ ≠ σ2 I ,
13I V
t he model can be wr i t t en mor e gener al l y as
y = Xβ + ε
( A. 1) - ( A. 4) ε ~ N( 0, Σ)
( A. 5) Same as bef or e
Thi s model i s r ef er r ed t o as t he gener al i zed nor mal l i near r egr ess i on
model and i ncl udes t he cl ass i cal nor mal l i near r egr ess i on model as a
speci al case, i . e. , when
Σ = σ2I .
The unknown par amet er s i n t he gener al i zed r egr ess i on model ar e t he
1 k
n(n - 1)'s = ( , ..., ) and the n(n+1) / 2 = n +
2
β β β
i ndependent par amet er s i n t he symmet r i c mat r i x Σ. I n gener al i t i s not
poss i bl e t o es t i mat e Σ unl ess some s i mpl i f yi ng assumpt i ons ar e made.
For exampl e, wi t h t he case of het er oskedas t i ci t y al one
ε
ε
=Σ
)(Var0
0)(Var
n
1
O
or f or aut ocor r el at i on al one
σ
εεεεσ
∑
2
n1212
),Cov(...)Cov(
= MO
and f or t he cl ass i cal nor mal l i near r egr ess i on model
14I V
σ
σ
=Σ2
2
0
0
O
2. Estimators of β
a. Leas t squar es es t i mat i on
SSE = ( y- Xβ) ' ( y- Xβ)
= y' y - 2β' X' y + β' X' Xβ
SSE
= 2X y + 2X X ∂
′ ′ β∂β
Set t i ng t hi s der i vat i ve equal t o zer o and sol vi ng yi el ds :
ˆ2 ' 2 'X y X X β=
β = ( X' X) -1 X' y
b. Maxi mum l i kel i hood es t i mat i on
-1-(1/ 2)(y-X ) ' (y-X )
n/ 2 1/ 2
eL(y; ) =
(2 |) |
β β∑β
π ∑
l = l nL = ( - n/ 2) l n ( 2π) - 1/ 2 l n Σ - 1/ 2 ( y- Xβ) ' Σ- 1
( y - Xβ)
= ( - n/ 2) l n ( 2π) - 1/ 2 l n Σ - 1/ 2 ( y' Σ- 1
y - 2β' X' Σ- 1
y +
β' X' Σ- 1
Xβ)
-1 -1d = (-1/ 2)(-2 X' y + 2X' X )
dβ
β∑ ∑
l
Set t i ng t hi s der i vat i ve equal t o 0 and sol vi ng i mpl i es
-1 -1(X X) = X y∆
′ ′β∑ ∑
whi ch ar e r ef er r ed t o as t he modi f i ed nor mal equat i ons . The sol ut i on
of t hese equat i ons
-1-1 -1 = (X X X y)∆
′ ′β ∑ ∑
15I V
i s t he maxi mum l i kel i hood es t i mat or of β.
16I V c. Bes t l i near unbi ased es t i mat or
Li near i t y condi t i on: β~
= Ay wher e A i s a k x n mat r i x of
unknown cons t ant s .
Unbi ased condi t i on: Sel ect A so t hat
E( β~
) = β, whi ch r equi r es E( β~
) = AE( y) = AXβ => AX = I
Mi ni mum var i ance condi t i on: Sel ect A so t hat
E( β~
) = β and Var ( β~
) i s a mi ni mum. Let Var ( β~
k) = a'
kΣa
k
wher e a'k
i s kt h r ow of t he mat r i x A. The mi ni mi zat i on pr obl em i s
t o mi n a'k
Σak
s . t . X' ak
= ik
( wher e ik
i s t he kt h col umn of t he
i dent i t y mat r i x) .
l = a' Σa + λ' ( X' a- I )
= 2 a + X = 0a
∂∑ λ
′∂
l
= X'a - I = 0, so X'A = I∂
′∂λ
l
-1-1a = X
2λ∑ .
Now f r om X' a = I , we subs t i t ut e f or a and have:
-1-1=> X X = I
2′ λ∑
λ = - 2 ( X' Σ- 1
X)- 1
I
=> a = Σ- 1
X( X' Σ- 1
X)- 1
I
a' = I ' ( X' Σ- 1
X)- 1
X' Σ- 1
so A = ( X' Σ- 1
X)- 1
X' Σ- 1
and
yX)XX( = ~ -1-1-1
∑′∑′β .
We obser ve t hat t he BLUE and MLE of β ar e i dent i cal , but di f f er ent
f r om t he l eas t squar es es t i mat or of β.
17I V
3. Distribution of β , β~
, and ∆
β .
For t he Cl ass i cal Nor mal Li near Regr ess i on Model
( ε ~ N ( 0, σ2I ) )
β = β~
= ∆
β = ( X' X)- 1
X' y ~ N( β; σ2( X' X)- 1
)
For t he Gener al i zed Regr ess i on Model ( ε ~ N( 0, Σ) ) we have
β = ( X' X)- 1
X' y = A1
y
and
β~
=∆
β = ( X' Σ- 1
X)- 1
X' Σ- 1
y = A2
y
Maki ng use of t he usef ul t heor em
I f y ~ N[ µy
; Σy
] , t hen
z = Ay ~ N [ µz
= Aµy
; Σz
= AΣy
A' ] ,
we obt ai n
β ~ N [ A1 Xβ; A1 Σ A' 1]
~ N [ β; ( X' X)- 1
X' Σ X( X' X)- 1
]
β~
=∆
β ~ N [ A2 Xβ; A2 Σ A' 2]
~ N [ β; ( X' Σ- 1
X)- 1
] .
Not e t hat t he β , β~
, and ∆
β ar e unbi ased es t i mat or s of β, but
Var ( β i) > Var ( β~
i) = Var (∆
β i) .
Al so not e t hat f or t he case Σ = σ2I , t hese r esul t s i ncl ude t he
f ol l owi ng as a speci al case
β = β~
=∆
β ~ N [ β, σ2 ( X' X)- 1
] .
18I V
4. Consequences of using least squares formulas when Var(ε) = ΣΣΣΣ ≠≠≠≠ σ2I
β = ( X' X)- 1
X' y and Var ( β ) =( X' X)- 1
X' Σ X( X' X)- 1 ( )
12 'X Xσ−
≠
a. β i s an unbi ased and cons i s t ent es t i mat or of β.
b. β i s not ef f i ci ent , Var ( βi
) ( )iVar β≥ % .
c. The use of σ2( X' X)- 1
wi l l f r equent l y r esul t i n ser i ous
under es t i mat es of Var ( β ) . *Associ at ed f or ms of t and F
s t at i s t i cs ar e no l onger val i d. However , r obus t measur es of
t he act ual s t andar d er r or s can be used t o cons t r uct “t -
s t at i s t i cs” whi ch ar e asymptotically val i d.
d. Pr edi ct i ons of yt
based on OLS wi l l yi el d l ar ger sampl i ng
var i at i on t han coul d
βi
19I V be obt ai ned us i ng al t er nat i ve t echni ques . See t he next
sect i on f or mor e det ai l .
5. Predictions in the generalized regression model:
Gol dber ger ( JASA, 1962) demons t r at ed t hat t he bes t unbi ased
pr edi ct i on of yt
i n per i od n + h, h- per i ods i n t he f ut ur e, i s
gi ven by
yn
( h) = yn+h
= Xn+h
∆
β + W' Σ- 1
e
wher e
∆
β = ( X' Σ- 1
X)- 1
X' Σ- 1
y
e = y - X∆
β
W = E( ε'N + h
ε) .
Ther ef or e t he pr edi ct i ons f or OLS or MLE may have sampl i ng var i ances
whi ch ar e l ar ger t han coul d be obt ai ned us i ng t he Gol dber ger
t echni que.
Not e:
a. I f t he ε' s ar e uncor r el at ed t hen
1
n+h 1
n n
n+h n
n
W = E = E = 0+
ε ε ε
ε ε ε ε
M
and t he bes t l i near unbi ased pr edi ct or of yt i n per i od n+h i s
yn+h
= Xn+h
∆
β
b. I f t her e i s cor r el at i on bet ween t he r andom di s t ur bances , t hen
t he bes t l i near unbi ased pr edi ct or may di f f er f r om our BLUE of
20I V
t he det er mi ni s t i c component Xn+hβ. The adj us t ment , W’Σ
- 1e,
woul d “cor r ect ” f or t he exi s t ence of cor r el at i on bet ween t he
r andom di s t ur bances .
21I V
6. Alternative methods of obtaining BLUE or MLE of β by transforming data or
using Generalized Least Squares (GLS).
The di scuss i on i n t hi s sect i on pr ovi des mot i vat i on f or t he way MLE
can be per f or med i n r egr ess i on pr ogr ams . Cons i der t he gener al i zed
r egr ess i on model :
y= Xβ + ε ε ~ N ( 0, Σ)
Tr ansf or m t he model ( and dat a) by pr emul t i pl yi ng by a
transformation mat r i x T, i . e. ,
[ Ty] = [ TX] β + [ Tε]
I f we sel ect a transformation mat r i x T such t hat
Tε ~ N ( 0, TΣT' = σ2I ) ,
t hen i t f ol l ows t hat
TΣT' = σ2I ( Tr ansf or med er r or t er ms Tε, sat i s f y ( A. 1) - ( A. 4) ) .
Σ = σ2T- 1
( T' )- 1
or Σ- 1
= σ- 2
T' T.
Appl yi ng l eas t squar es t o t he t r ans f or med dat a, we obt ai n
β T = [ ( TX) ’TX]- 1
[ X’T’Ty] = ( ) ( )1
' ' ' 'X T TX X T Ty−
=
whi ch yi el ds t he maxi mum l i kel i hood es t i mat or of β, i . e. ,
β T = ( X’Σ- 1
X)- 1
X’Σ- 1
y
In other words, applying least squares to an appropriately transformed
regression model will yield MLE of β. These estimators are sometimes
referred to as generalized least squares (GLS) estimators of β .
22I V
7. Robust estimates of the standard errors of the OLS estimator
As we not ed ear l i er , i f 2IΣ ≠ σ ,
( ) ( ) ( )-1 1-1 2
OLSVar X'X X' X(X'X) X Xˆ '−
β = Σ ≠ σ and OLS “s t andar d er r or s”
r epor t ed by mos t comput er pr ogr ams , ( )12 's X X
−, wi l l be
i nappr opr i at e f or cons t r uct i ng t - s t at i s t i cs . Whi t e ( 1980,
Economet r i ca, pp. 817- 838) and Newey- West ( 1987, Economet r i ca, 703-
708) out l i ne how t o obt ai n cons i s t ent es t i mat or s of t he cor r ect
( )ˆOLS
Var β f or t he cases of het er oskedas t i ci t y and aut ocor r el at i on.
These pr ocedur es ar e pr ogr ammed i nt o many economet r i c packages .
In Stata
. for heteroskedasticity: reg dep_var rhs_vars, robust
or
. for autocorrelation: newey dep_var rhs_vars, lag(#) wher e
( #) i s
t he maxi mum number of l ags t o cons i der i n t he aut ocor r el at i on
s t r uct ur e. Typi ng “l ag( 0) i s t he same as us i ng t he “r eg …,
r obus t ”
command above.
23I V
E. Heteroskedasticity (Violation of (A.3))
1. Introduction
I n cer t ai n appl i cat i ons t he r esear cher may f i nd t hat t he
assumpt i on
Var ( yt
) = Var ( εt
) = σ2 f or al l t
appear s t o be i ncons i s t ent wi t h t he dat a and model under
cons i der at i on. Thi s pr obl em can ar i se i n a number of cont ext s . For
exampl e, i f t he dat a ar e obt ai ned by combi ni ng cr oss - sect i onal and
t i me ser i es dat a wher e di f f er ent sampl e s i zes ar e i nvol ved, one mi ght
expect t he aver ages ( or t ot al s ) associ at ed wi t h t he l ar ges t sampl e
s i ze t o have a di f f er ent var i ance t han obser vat i ons associ at ed wi t h
t he smal l es t sampl e s i ze. Anot her exampl e of het er oskedas t i ci t y whi ch
mi ght ar i se i n an anal ys i s of expendi t ur e pat t er ns ( Ct
) cor r espondi ng
t o di f f er ent i ncome l evel s ( yt
) i n budget s t udi es .
I n t hi s exampl e we not e t hat t her e appear s t o be gr eat er var i at i on i n
consumpt i on l evel s associ at ed wi t h hi gher i ncome l evel s t han f or l ower
β2 = s l ope
β1
24I V l evel s . Thi s mi ght ar i se because i ndi vi dual s wi t h hi gher i ncomes can
make mor e di scr et i onar y pur chases t han t hose wi t h l ower i ncomes who
spend mos t of t hei r i ncome on necess i t i es . Thi s s i t uat i on coul d be
model ed as
Ct
= β1
+ β2
Yt
+ εt
( A. 1) , ( A. 2) , ( A. 3) ’: εt
~ N( 0, σt
2)
( A. 4) Cov ( εt
, εs
) = 0 t ≠ s
( A. 5) Same as bef or e.
Mor e gener al l y t he het er oskedas t i c model can be model ed as
y = Xβ + ε
( A. 1) ' ε ~ N[ 0, Σ]
( A. 5) The X' s ar e nons t ochas t i c and
-1
n
(X X) Limit
n→∞
′i s nons i ngul ar
wher e
21
22
2n
... 0
0
. . .
. . .
. . .
0 ...
σ
σ
Σ =
σ
.
As not ed i n t he pr evi ous sect i on, i f Σ ≠ σ2I ( any of t he var i ances
ar e unequal ) , l eas t squar es es t i mat or s wi l l not be equal t o t he MLE or
BLUE of β. Leas t squar es es t i mat or s wi l l s t i l l be unbi ased and
cons i s t ent , but wi l l not be mi ni mum var i ance nor asympt ot i cal l y
ef f i ci ent and t he s t andar d s t at i s t i cal t es t s based on l eas t squar es
ar e i nval i d. For t hi s r eason i t i s i mpor t ant t o t es t f or t he
exi s t ence of het er oskedas t i ci t y.
25I V
2. Test for Heteroskedasticity
The bas i c i dea behi nd al l of t hese t es t s i s t o det er mi ne whet her t her e
appear s t o be any sys t emat i c behavi or of t he var i ances of t he er r or s .
The f i r s t t es t , t he Gol df el d- Quandt t es t , gr oups t he dat a and t es t s
f or equal i t y of t he var i ances of t he di f f er ent gr oups . Many of t he
ot her t es t s use t he squar ed OLS r es i dual ( )2
te as a pr oxy f or 2
tσ and
sear ch f or sys t emat i c r el at i onshi ps bet ween ( )2
te and ot her var i abl es .
a. Goldfeld-Quandt Test
The nul l hypot hes i s t o be i nves t i gat ed i s
H0
: 2
1σ = 2
2σ = . . . = 2
nσ
A common t es t f or het er oskedas t i ci t y i s t he Gol df el d- Quandt t es t .
( 1) Di vi de t he dat a i nt o t hr ee gr oups ( r oughl y equal s i zes n1
+ n2
+ n3
= n)
( 2) Run separ at e r egr ess i on on gr oups I and I I I . Let s 2
I and s 2
III
r epr esent t he cor r espondi ng es t i mat or s of σ2.
( 3) Under t he nul l hypot hes i s of homoskedas t i ci t y,
2III
3 12I
s ~ F( - k, - k)n n
s
*pl ace t he l ar ger s 2 i n t he numer at or .
26I V
Under t he nul l hypot hes i s one woul d expect 2
III2
I
ss
t o be f ai r l y
cl ose t o one and l ar ge di f f er ences f r om one woul d pr ovi de t he
bas i s f or r ej ect i ng t he nul l hypot hes i s . Thi s i s an exact
t es t . A di sadvant age of t he t es t ar i ses i n cases i n whi ch
many r egr essor s ar e i nvol ved and a nat ur al or der i ng may not be
obvi ous t o f or m t he t hr ee gr oups .
b. The Park test (Glejser test) can be t hought of as bei ng based upon
us i ng et as a pr oxy f or σ
t and t hen i nves t i gat i ng r el at i onshi ps
of t he f or m
et
= f ( Xt
) or
2
te = g( Xt
) .
F(n3 - k, n1 - k)
Fail to Reject H0 Reject H0
27I V Var i ous f or ms f or t he f unct i ons f ( ) and g( ) have been
cons i der ed. The nul l hypot hes i s of homoskedas t i ci t y i s t es t ed by
i nves t i gat i ng whet her t he X’s i n f ( Xt
) or g( Xt
) have any
col l ect i ve expl anat or y power . St at i s t i cal l y s i gni f i cant
expl anat or y power of t he Xt woul d pr ovi de t he bas i s f or r ej ect i ng
t he assumpt i on of homoskedas t i ci t y. The exact val i di t y of F t es t s
i s ques t i onabl e, wi t h t hei r use bei ng based on asympt ot i c
cons i der at i ons . Recal l t hat t he et
' s ar e cor r el at ed even i f t he ε
t' s ar e uncor r el at ed.
c. The White test [ Economet r i ca, 1980, pp. 817- 38] . Hal Whi t e
sugges t s r egr ess i ng 2
te on al l of t he expl anat or y var i abl es , t hei r
squar es , and cr oss pr oduct s and t hen t es t i ng f or t he col l ect i ve
expl anat or y power of t he r egr essor s . The r at i onal e f or t hi s t es t
i s t hat t he hypot hes i s 2
tσ = f ( Xt
) i s bei ng i nves t i gat ed wi t h 2
te as
a pr oxy f or 2
tσ and us i ng a second or der Tayl or Ser i es
appr oxi mat i on f or t he f unct i on f ( Xt
) . The nul l hypot hes i s of
homoskedas t i ci t y woul d be cons i s t ent wi t h a l ack of s t at i s t i cal
s i gni f i cance t es t . Whi t e ment i ons t he use of a Rao or Lagr angi an
mul t i pl i er t es t
LM = NR2
whi ch i s asympt ot i cal l y Chi squar e wi t h degr ees of f r eedom equal
t o t he number of s l ope coef f i ci ent s ,2
1)2)(k-(k+, i n t he “ 2
te
auxi l i ar y” r egr ess i on equat i on.
Not e: The R2 i n t he LM t es t i s t he R2 f r om t he pr evi ous l y
descr i bed “ 2
te r egr ess i on” equat i on. The Whi t e t es t can be
per f or med by r et r i evi ng t he es t i mat ed er r or s and r egr ess i ng t hem
on t he var i abl es , t hei r squar es , and cr oss - pr oduct s .
28I V
Al t er nat i vel y, t he St at a command reg y x’s, f ol l owed by whitetst on
t he next l i ne wi l l aut omat i cal l y per f or m t he Whi t e Tes t .
d. The modified White test. For l ar ge k, t he Whi t e t es t i nvol ves
many r egr essor s wi t h l ar ge degr ees of f r eedom. To ci r cumvent t hi s
pr obl em, Whi t e pr oposed an al t er nat i ve t es t based on es t i mat i ng
t he model :
2 2
0 1 2ˆ ˆ
t t t te y yδ δ δ η= + + +
wher e ˆt
y denot es t he pr edi ct ed y’s f r om an i ni t i al OLS es t i mat i on
of t he or i gi nal model The cor r espondi ng LM t es t ( 2NR ) i s
asympt ot i cal l y di s t r i but ed as a ( )2 2χ .
e. Breusch-Pagan Test. Thi s t es t i s i ncl uded i n St at a. I t
i s per f or med by r egr ess i ng t he squar es of t he es t i mat ed er r or s on
t he X’s or ot her var i abl es and t es t i ng f or t he col l ect i ve
expl anat or y power us i ng an LM t es t or an F t es t . The St at a
commands ar e:
reg y x
estat hettest (performs the regression
2
0 1ˆ
t t te yδ δ η= + + ) , iid ( r epor t s LM t es t s t at i s t i c) or fstat
( r epor t s t he F- s t at i s t i c)
Al t er nat i ves or var i at i ons
estat hettest x’s, iid or normal or fstat
estat hettest, rhs
estat hettest x’s, x^2’s, cross-products, iid
or fstat
estat hettest yhat yhat^2, ftest or iid
wher e t he LM or F- t es t s can be used t o
t es t 2 2
0 :t
H σ σ= ( homoskedas t i ci t y).
3. Estimation
29I V
a. Viewed as applying OLS to an appropriately transformed model (Stata)
For appl i cat i ons i n whi ch t he r andom di s t ur bances ar e
char act er i zed by het er oskedas t i ci t y, BLUE and MLE of β wi l l be
unbi ased, cons i s t ent , and have smal l er var i ances t han l eas t
squar es es t i mat or s . I n sect i on ( I V. D. 5) we demons t r at ed t hat i f a
mat r i x T can be f ound such t hat
Var ( Tε) = σ2I ( or Σ- 1
= σ- 2
T' T) ,
t he MLE ( and BLUE) of β can be obt ai ned by t r ans f or mi ng t he dat a
( model ) f r om
y = Xβ + ε
t o
Ty = TXβ + Tε
and appl yi ng l eas t squar es t o t he t r ans f or med model .
Cons i der t he model
yt
= Xtβ + ε
t
= β1
+ β2
xt 2
+ . . . + βk
xt k
+ εt
wher e εt
~ N ( 0, σ 2
t ) .
We wi l l cons i der t he t r ans f or mat i on f r om a s l i ght l y di f f er ent
per spect i ve. The or i gi nal model can be t r ans f or med t o a f or m
char act er i zed by homoskedas t i ci t y by pr emul t i pl yi ng t he or i gi nal
f or mul at i on by σ/ σt
, i . e. , ( wher e σ i s an unknown cons t ant )
σ
σε
σ
σβ
σ
σβ
σ
σβ
σ
σ
t
t
t
tk
kt
2t
2t
1t
t + x
+ ... + x
+ = y
.
Not e t hat t he var i ance of t he t r ans f or med r andom di s t ur bance i s
gi ven by
2t
t2t t
Var = Var( ) σ ε σ
ε σσ
30I V
σσ
σσ 2
2t
2t
2
= =
and t he er r or s i n t he t r ans f or med r egr ess i on, σεt
/ σt
, sat i s f y
t he assumpt i ons ( A. 1) - ( A. 4) .
The cor r espondi ng transformation mat r i x i s gi ven by
1
2
3
n
10 0 0 0
10 0 0 0
10 0 0T
0 0 0 0
10
σ σ
= σ σ
σ
K
L
L
M O M
L
Not e t hat :
21 11
2
22 2
2
n
n n
1 10 00
1 1T T
01 10 0
σ σ σ
σ σ σ ′Σ = σ σ
σ σ σ
OO O
= σ2I
and t he t r ansf or med dat a mat r i ces ar e gi ven by:
31I V
11
1
1
n
nnn
1 y0 0y
0y* Ty ,
y10
y
σ σ = σ = σ = σ σ
K
MM M
M O
TX =
/x /x/1
. ..
. ..
. ..
/x.../x/1
= X*
nnkn2nn
1k11121
σσσ
σσσ
σ .
An appl i cat i on of l eas t squar es t o t he t r ansf or med dat a wi l l
yi el d MLE and BLUE of β. I t can be ver i f i ed t hat T' T = σ2Σ- 1
.
Not e:
I n t he GLS es t i mat or t he mul t i pl i cat i ve cons t ant i n t he
t r ansf or mat i on mat r i x i s ar bi t r ar y and wi l l cancel out . I n
summar y, i f t he or i gi nal model i s y X β ε= + , and we appl y OLS
t o t he t r ansf or med model , we obt ai n
ˆT
β = ( X' T' TX)- 1
X' T' Ty
= ( Xσ2 Σ-1X) -1 X' σ2 Σ-1y
= ( X' Σ- 1
X)- 1
X' Σ- 1
y
= ∆
β = β~
.
Thus when choos i ng a T mat r i x f or dat a t r ans f or mat i on, t he
unknown cons t ant σ need not be speci f i ed.
b. Estimation using Stata:
The command
vwls y X’s, sd(t
σ )
32I V wi l l per f or m t he pr evi ous l y descr i bed es t i mat i on and yi el d MLE.
The mai n pr obl em i s t o det er mi ne what t he t
σ shoul d be.
4. Nature of Heteroskedasticity (σt's) and estimation
The pr obl em of es t i mat i ng t he σt
s t i l l r emai ns and t her e i s not a
gener al sol ut i on whi ch wi l l wor k i n al l cases .
a. Sometimes σt can be deduced from the model
( 1) yt
= at + ηt
t = number of t osses of a coi n
yt
= number of heads i n t t osses
E( yt
) = at
Var ( ηt
) = npq = t ( 1/ 2) ( 1- 1/ 2) = t / 4 = 2
tσ
St at a Commands f or MLE ar e:
gen sig =t^.5
vwls y t,sd(sig)
The l eas t squar es es t i mat i on of a i s gi ven by a = Σt yt
/ Σt 2
and t he MLE of a i s Σyt
/ Σt = t ot al number of heads / t ot al
number of t osses .
( 2) Combi nat i on of t i me ser i es and cr oss - sect i onal dat a
( yt
, Xt
) t i me ser i es obt ai ned by t aki ng
aver ages of cr oss - sect i onal sampl es of s i ze nt
Let yt
= a + bxt
+ εt
be t he model , t hen an assumpt i on whi ch
mi ght be "r easonabl e" i s
Var ( yt
) = Var ( εt
) = σ2/ nt
The cor r espondi ng St at a commands f or MLE ar e
33I V
gen sig = 1/ tn ^ .5
vwls y x, sd(sig)
b. Sometimes the researcher can analyze the behavior of the residuals and look
for trends
Tr y σ 2
t = σ2xt
or σ 2
t = σ2 x 2
t .
I f σ 2
t = σ2xt
t hen use t he St at a commands
gen sig=x^.5
vwls y x, sd(sig)
Si mi l ar l y i f σ2t = σ2x2
t , t hen use t he St at a commands
gen sig=x
vwls y x, sd(sig)
c. An example of Feasible GLS with multiple regressors (Wooldridge).
Cons i der t he model yt
= Xtβ + ε
t wi t h ( )2 tX
t t tVar X e
δσ ε= = .
Estimated or f eas i bl e GLS ( BLUE) of t he unknown coef f i ci ent s i n t he
or i gi nal r egr ess i on model can be obt ai ned as f ol l ows:
( 1) Regr ess y on t he X’s t o obt ai n t he es t i mat ed r es i dual s ( e)
reg y X’s
34I V ( 2) Regr ess t he nat ur al l ogar i t hm of t he squar ed OLS r es i dual s
on t he X’s and save t he pr edi ct ed val ues ( ˆt
X δ ) .
predict e, resid
gen Le2=ln(e*e)
reg Le2 X’s
predict xdelta,xb
gen sig=(exp(xdelta))^.5
Use t he cal cul at ed wei ght s (( )( )
.5ˆtX
te
δσ = ) t o per f or m a wei ght ed
l eas t squar es
vwls y X’s,sd(sig)
Al t er nat i ve assumpt i ons about t he nat ur e of het er oskedas t i ci t y
coul d be used i n t hi s pr ocedur e.
5. Predictions
The bes t l i near unbi ased pr edi ct or s wi l l be gi ven by
( )ˆ ˆn h nY Y h+ = = X
n+h ∆
β
( see not es ( sect i on D. 5) ) .
F. Autocorrelation (Violation of A.4)
1. Introduction
One of t he mos t common vi ol at i ons of ( A. 1) - ( A. 5) wi t h t i me ser i es dat a i s
t he pr esence of aut ocor r el at ed r andom di s t ur bances i n r egr ess i on model s .
Aut ocor r el at ed r andom di s t ur bances r ef er s t o t he pr obl em i n whi ch t he
er r or t er ms ar e not s t at i s t i cal l y i ndependent . When wor ki ng wi t h t i me
ser i es dat a, you shoul d be awar e of t he poss i bi l i t y of what i s known as
t he spurious regression pr obl em. Thi s pr obl em can ar i se when t he dependent
var i abl e ( y) and one or mor e of t he expl anat or y var i abl es ( say X) bot h
35I V exhi bi t a t r endi ng behavi or . I n t hi s s i t uat i on, r egr ess i ng y on X may
sugges t a s t at i s t i cal l y s i gni f i cant r el at i onshi p bet ween y and X, when
t hey ar e unr el at ed ( a spur i ous r egr ess i on) and onl y appear r el at ed because
of a shar ed t r endi ng behavi or . One appr oach t o ci r cumvent i ng t hi s
s i t uat i on i s t o i ncl ude “t” i n t he set of r egr essor s , e. g. ,
t 1 2 t 3 ty X t= β + β + β + ε . I f t hi s i s t he cor r ect model and t he var i abl e t i s
del et ed f r om t he equat i on, t he r esul t ant es t i mat or s of 1 2 and β β wi l l be
bi ased. The OLS es t i mat e f or 2β i s t he same as woul d ar i se f r om
r egr ess i ng t he r es i dual s f r om a r egr ess i on of y on t on t he r es i dual s
obt ai ned f r om r egr ess i ng x on t .
Ti me ser i es r egr ess i ons i n St at a r equi r e t he user t o des i gnat e t hat
t he ser i es i s a t i me ser i es by i ncl udi ng a command of t he f or m tsset t wher e
t i s a t i me- var i abl e whi ch indexes t he dat a. Thi s can be cr eat ed wi t h t he
command gen t=_n.
The case of pos i t i ve aut ocor r el at i on mi ght be depi ct ed as f ol l ows:
β1
+ β2
Xt
36I V
Not e t hat pos i t i ve r andom di s t ur bances t end t o be f ol l owed by pos i t i ve
r andom di s t ur bances and negat i ve r andom di s t ur bances t end t o be f ol l owed
by negat i ve r andom di s t ur bances . Thus , we ar e f aced wi t h a s i t uat i on i n
whi ch t he non- di agonal el ement s of
( ) ( ) ( )( ) ( )
( ) ( )
1 1 2 1 n
2 1 2
n 1 n
Var Cov , Cov ,
Cov , Var
Cov , Var
ε ε ε ε ε
ε ε ε Σ = ε ε ε
L
M
M O
L
ar e nonzer o; t her ef or e Σ ≠ σ2I and t he l eas t squar es es t i mat or s of β
agai n wi l l not equal t he MLE or BLUE of β and ar e t her ef or e not mi ni mum
var i ance es t i mat or s .
Poss i bl e causes of aut ocor r el at ed r andom di s t ur bances mi ght i ncl ude
del et i ng a r el evant var i abl e, sel ect i ng t he i ncor r ect f unct i onal f or m, or
t he model may be cor r ect l y speci f i ed, but t he er r or t er ms ar e cor r el at ed.
The mat r i x Σ cont ai ns2
1)n(n+ =
2
1)n(n- +n di s t i nct el ement s . I n t he
cont ext of t he gener al i zed r egr ess i on model , we l ack suf f i ci ent dat a t o
obt ai n separ at e i ndependent es t i mat es f or each of t he Cov( εiεj
) . I n or der
t o ci r cumvent t hi s pr obl em we f r equent l y assume t hat t he εt
' s ar e r el at ed
i n such a manner t hat f ewer par amet er s descr i be t he pr ocess . One such
model whi ch pr ovi des an accur at e appr oxi mat i on i n many cases i s t he f i r s t
or der aut or egr ess i ve pr ocess
εt
= ρ εt - 1
+ ut
wher e t he ut
ar e assumed t o be i ndependent l y and i dent i cal l y di s t r i but ed
as N( 0, σ 2
u ) . Not e t hat t he ut
sat i s f y assumpt i ons ( A. 1) - ( A. 4) . Based
upon t hi s f or mul at i on i t can be shown t hat E( εt
) = 0
37I V
• ρ
σσε ε 2
2u2
t-1
= = )Var(
• Cov( εt
, εt - s
) = ρs σ 2
ε
= 0 <=> ρ = 0
• Cor r ( εt
, εt - s
) = ρ s
Not e: εt
= ρ( εt - 1
) + ut
= ρ( ρεt - 2
+ ut - 1
) + ut
= ρ2εt - 2
+ ρut - 1
+ ut
= ut
+ ρut - 1
+ ρ2ut - 2
. . .
u = rt-r
0=r
ρ∑∞
=> E( εt
) = 0 s i nce E( ut - r
) = 0 f or al l t and r
... + )uE( + )uE( + )uE( = )E( 22t-
421t-
22t
2t ρρε
= 2
uσ ( 1 + ρ2 + ρ4 + . . . )
= σ 2
u / ( 1 - ρ2)
E( εt
εt - s
) = ...)] u + u + ux(...) u + u + uE[( 22s-t-1s-t-st-
22t-1t-t ρρρρ
= E [ ut
+ ρut - 1
+ . . . ρs( ut - s
+ ρut - s - 1
+ . . . ) ] ( ut - s
+ ρut - s - 1
. . . )
= ρs E[ ( ut - s
+ ρut - s - 1
+ . . . ) 2] ( )2
2
s
tEρ ε −=
= ρs σ 2
ε = ρs σ2u/ ( 1 - ρ2) .
We obser ve t hat t he r andom di s t ur bances εt
ar e char act er i zed by cons t ant
var i ance ( homoskedas t i ci t y) but ar e uncor r el at ed i f and onl y i f ρ = 0 i n
38I V whi ch case t he εt = ut and assumpt i ons ( A. 1) and ( A. 4) ar e sat i s f i ed. We
al so not e t hat s i nce
Cov( εt
, εt - 1
) = E( εt
εt - 1
) = 2
ερσ , i . e. ,
we expect a gener al pat t er n of pos i t i ve r andom di s t ur bances t o be f ol l owed
by pos i t i ve r andom di s t ur bances and negat i ve val ues t o be f ol l owed by
negat i ve val ues i f ρ > 0. However , i f ρ < 0, we woul d gener al l y expect
t he s i gns of t he r andom di s t ur bances t o al t er nat e.
Based upon t he assumpt i on t hat t he pr ocess εt
i s a f i r s t or der
pr ocess , we can wr i t e t he associ at ed var i ance covar i ance mat r i x as
2 n 1
n 2
2u 2 n 3
2
n 1 n 2 n 3
1
1
= .11-
1
−
−
−
− − −
ρ ρ ρ
ρ ρ ρ σ ∑ ρ ρ ρ
ρ ρ ρ ρ
L
L
L
M M M O M
L
Σ i s now compl et el y char act er i zed by t he t wo par amet er s ρ and 2
εσ =2
u
21
σ
− ρand
t he es t i mat i on pr obl em i s cons i der abl y s i mpl i f i ed.
A pl ot of cor r ( εt
, εt - s
) f or di f f er ent val ues of s i s r ef er r ed t o as
t he cor r el ogr am of t he pr ocess εt
. I f t he sampl e cor r el ogr am ( gr aph of
es t i mat ed cor r el at i on coef f i ci ent s ) appear s
as
ρ
39I V
ρ2
0 1 2 s
We woul d i nt er pr et t hi s evi dence as bei ng cons i s t ent wi t h t he assumpt i on
of a f i r s t - or der aut or egr ess i ve pr ocess wi t h a pos i t i ve ρ. The sampl e
cor r el ogr am can be gener at ed wi t h t he Stata commands : r eg y x’s
pr edi ct e, r es
ac e, l ags ( # of l ags )
We have shown t hat wi t hi n t he cont ext of a f i r s t - or der aut or egr ess i ve
model Σ = σ2I , i f and onl y i f ρ = 0. I t becomes i mpor t ant t o t es t t he
hypot hes i s t hat ρ = 0.
A mor e gener al model f or t he di s t ur bances i s an aut or egr ess i ve movi ng
aver age ( ARMA( p, q) ) def i ned by
εt
- φ1εt - 1
. . . - φpεt - p
= ut
- θ1
ut - 1
. . . - θq
ut - q
.
Thi s model wi l l be s t udi ed i n mor e det ai l i n anot her sect i on. Not e t hat
t hi s speci f i cat i on i ncl udes t he f i r s t or der aut or egr ess i ve pr ocess as t he
f ol l owi ng speci al case
ARMA ( p = 1, q = 0) : εt
- φ1εt - 1
= ut
.
2. Tests for autocorrelation.
a. The right hand side variables are exogenous
Ther e ar e numer ous t es t s f or t he pr esence of aut ocor r el at i on wher e t he
r i ght hand s i de var i abl es ar e exogenous . Among t hese ar e ( 1) t he Dur bi n
Wat son t es t , ( 2) t es t s s t r uct ur ed i n t er ms of an es t i mat or of t he
cor r el at i on bet ween εt
and εt - 1
, ( 3) Thei l - Nagar t es t , ( 4) t he Von Neumann
r at i o, ( 5) t he Br eusch- Godf r ey t es t , ( 6) t he Lj ung- Box t es t , and ( 7) a
t es t f or t he number of s i gn changes i n t he es t i mat ed r andom di s t ur bances
40I V ( Runs t es t ) . Of t hese t es t s , t he Dur bi n Wat son t es t s t at i s t i c i s pr obabl y
t he mos t wi del y used.
( 1) Dur bi n- Wat son t es t
The Dur bi n- Wat son t es t s t at i s t i c i s def i ned by
wher e et
denot es t he l eas t squar es es t i mat or of t he r andom
di s t ur bance εt
. Thi s expr ess i on can be wr i t t en i n a usef ul
al t er nat i ve f or m by not i ng t hat
e + ee 2 - e = )e-e( 2
1t-
n
2=t
1t-t
n
2=t
2t
n
2=t
21t-t
n
2=t
∑∑∑∑
n n n2 2 2 2t t t t-1 1 n
t=1 t=1 t=2
= + - 2 - - e e e e e e∑ ∑ ∑
e - e - ee - e 2 = 2n
211t-t
n
2=t
2t
n
1=t
∑∑
hence,
e
e - e - ee - e 2
= .W.D2t
n
1=t
2n
211t-t
n
2=t
2t
n
1=t
∑
∑∑
( )n
t t-12 21 n t=2
n n2 2t t
t=1 t=1
/e e + e eˆ ˆ= 2(1- ) - where =
/e e
n
n
ρ ρ∑
∑ ∑
so t hat D.W. 2(1 - ρ ) wi t h ρ denot i ng an es t i mat or of ρ, t he
cor r el at i on bet ween t-1 t
andε ε .
e
)e - e(
= .W.D2t
n
1=t
2
1t-t
n
2=t
∑
∑
41I V Fr om t hi s expr ess i on we not e t hat i f ρ = 0, we woul d expect t o
have ρ "cl ose" t o zer o and t he val ue of D. W. cl ose t o t wo. Si nce
D. W. depends upon t he dat a, associ at ed conf i dence i nt er val s woul d be
dat a dependent . Some economet r i c pr ogr ams use t he dat a and cal cul at e
exact p-values. To ci r cumvent t hi s pr obl em, Dur bi n and Wat son der i ved
t he di s t r i but i on of t wo s t at i s t i cs L and U whi ch ar e i ndependent of
t he dat a and bound D. W. , L< D.W. <U . Tabul at ed cr i t i cal val ues f or
t he D. W. ar e based on L and U; hence, t he r epor t ed conf i dence
i nt er val s f or t he hypot hes i s ρ = 0 f or D. W. ( der i ved f r om conf i dence
i nt er val s f or t he bounds) may appear somewhat pecul i ar as i l l us t r at ed
by t he f ol l owi ng f i gur e.
42I V The val ues of dL and dU def i ne t he cr i t i cal r egi on and ar e
t abul at ed i n many t ext s accor di ng t o t he cr i t i cal l evel ( α l evel ) ,
sampl e s i ze ( n) , and number of noni nt er cept ( s l ope) coef f i ci ent s i n
t he model ( k' ) . The t abl es have been ext ended t o cover addi t i onal
sampl e s i zes and number of expl anat or y var i abl es by Savi n and Whi t e
[ Economet r i ca, 1977] .
The nul l hypot hes i s Ho: ρ = 0 i s rejected i f
D. W. < dL or D. W. > 4 - dL.
We fail to reject t he hypot hes i s i f
dU < D. W. < 4 - dU,
and t he t es t i s inconclusive i f
dL < D. W. < dU or 4 - dU < D. W. < 4 - dL.
Thi s t es t i s not s t r i ct l y appr opr i at e f or model s wi t h l agged
dependent var i abl es i ncl uded ( see Dur bi n, Economet r i ca, 1970) . The
D. W. t es t does not t ake account of t he expl anat or y var i abl es , whi ch
r esul t s i n t he exi s t ence of an “uncer t ai n r egi on. ” The St at a
commands t o cal cul at e t he D. W. s t at i s t i c ar e:
o reg lhs_var rhs_vars
o estat dwatson (performs a Durbin Watson test for
serial correlation)
o estat bgodfrey or
o estat bgodfrey, lags(1/4)
An exact D. W. t es t whi ch t akes account of t he X' s and does not
i nvol ve an “uncer t ai n” r egi on i s avai l abl e i n some comput er
pr ogr ams . The Shazam command t o cal cul at e t he exact D. W. i s OLS y
x’s , DWPVALUE .
( 2) Wool dr i dge’s t - t es t
43I V
Wool dr i dge t es t of 0 : 0H ρ = , no aut ocor r el at i on, i s based on t es t i ng
whet her l agged OLS er r or s have s t at i s t i cal l y s i gni f i cant expl anat or y
power f or cur r ent er r or s . Thus , t he r egr ess i on commands coul d be
reg y x’s
predict e, resid
reg e l.e
and a t or F s t at i s t i c i s used t o t es t f or s t at i s t i cal s i gni f i cance,
r ecogni zi ng t hat t hei r val i di t y i s based on asympt ot i c
di s t r i but i ons . Thi s appr oach woul d not be val i d f or t he hypot hes i s
0 : 1H ρ = because t he cor r espondi ng t - s t at i s t i c i s not di s t r i but ed as
a t - s t at i s t i c. A Di ckey- Ful l er t es t coul d be used f or t hi s
hypot hes i s .
b. Tests in the presence of lagged dependent variables
( 1) Dur bi n’s h- t es t , def i ned by,
2
_y coefficient
12 1
lagged
DW nh
ns
= −
− ~N[ 0, 1]
can be used t o t es t f or t he pr esence of aut ocor r el at i on i n an
aut or egr ess i ve model wi t h one l agged dependent var i abl e.
Dur bi n’s h- t es t can be per f or med i n St at a wi t h t he command
f ol l owi ng t he “r eg” command
. estat durbinalt
( 2) The Br eusch- Godf r ey and Lj ung- Box t es t s can be modi f i ed t o
appl y t o aut or egr ess i ve model s . For exampl e t he Br eusch- Godf r ey
t es t can be appl i ed by r egr ess i ng t he OLS
t' on the lagged y's and the lagged e 't
e s s i mpl i ed by t he model
( aut or egr ess i ve and number of aut or egr ess i on or movi ng aver age
44I V er r or s ) and t es t i ng f or t he col l ect i ve expl anat or y power of t he
coef f i ci ent s of t he l agged er r or s us i ng an F- t es t .
3. Estimation
For appl i cat i ons i n whi ch t he hypot hes i s of no aut ocor r el at i on i s
r ej ect ed, we may want t o obt ai n maxi mum l i kel i hood es t i mat or s of t he
vect or β. These can be obt ai ned by pr oceedi ng i n t he same manner as i n
t he case of het er oskedas t i ci t y, i . e. , we wi l l at t empt t o t r ans f or m t he
model so t hat t he t r ans f or med r andom di s t ur bances sat i s f y ( A. 1) - ( A. 4) and
t hen appl y l eas t squar es .
Cons i der t he model
yt
= Xtβ + ε
t = β
1 + β
2x
t 2 + . . . + β
kx
t k + ε
t
wher e
εt
= ρεt - 1
+ ut
t = 1, 2, . . . , n.
Repl aci ng t he t i n t he expr ess i on f or yt
by t - 1 and mul t i pl yi ng by ρ we
obt ai n
ρyt - 1
= ρXt - 1
β + ρεt - 1
= β1ρ + β
2ρx
t - 1 2 + . . . + β
Kρx
t - 1
k + ρε
t - 1
Subt r act i ng ρyt - 1
f r om yt
yi el ds
yt
- ρyt - 1
= β1
( 1- ρ) + β2
( xt 2
- ρxt - 1
2
) + . . . + βk
( xt k
- ρxt - 1
k
) + εt
- ρεt - 1
or y*t
= β1
( 1 - ρ) + β2
xt 2
* + . . . + βk
xt k
* + ut
t = 2, . . . , n
wher e y*t
= yt
- ρyt - 1
xt i
* = xt i
- ρxt - 1
i
t = 2, . . . , n, i = 2, . . . , k.
Not e t hat we have ( n - 1) obser vat i ons on yt
*, xt i
*. The r andom
di s t ur bance t er m associ at ed wi t h t he t r ans f or med equat i on sat i s f i es ( A. 1) -
( A. 4) . The t r ansf or med dat a mat r i ces ar e gi ven by
45I V
ρ
ρ
ρ
ρ
ρ
ρ
y
y
.
.
.
y
y
y
1-...0000
.. .
.. .
.. .
00...01-0
00...001-
=
y - y
.
.
.
y - y
y - y
= y*
n
1n-
3
2
1
1n-n
23
12
( n- 1) x 1 ( n- 1) x n n x 1
= T1
Y
and
2,2 1,2 2,k 1,k
3,2 2,2 3,k 2,k
n,2 n 1,2 n,k n 1,k
1 x x x x
1 x x x xX*
1 x x x x− −
− ρ − ρ − ρ
− ρ − ρ − ρ = − ρ − ρ − ρ
L
L
M
L
= T1X
A common t echni que of es t i mat i on i s t hen based upon appl yi ng l eas t squar es
t o
y* = X* β + u
or
yt
- ρyt - 1
= β1
( 1- ρ) + β2
xt 2
* + . . . + βk
xt k
* + ut
t =
2, . . . , n
Sever al comment s need t o be made about t hi s appr oach. Fi r s t , ρ i s
gener al l y not known and es t i mat es of ρ wi l l need t o be used. Al so not e
t hat t he i nt er cept i n t he t r ansf or med equat i on i s β1
( 1- ρ) , r at her t han 1β ;
46I V hence, t he f i nal es t i mat e of t he i nt er cept mus t be di vi ded by 1- ρ i n or der
t o r ecover an es t i mat e of β1
. Fi nal l y, we need t o ment i on t hat even i f ρ
i s known t hi s es t i mat or of β wi l l not be i dent i cal l y equal t o t he MLE of
β because n- 1 obser vat i ons ar e used r at her t han n obser vat i ons , i . e. , we
ar e not us i ng al l of t he sampl e i nf or mat i on i n t he es t i mat i on. Thi s l as t
pr obl em can be cor r ect ed and MLE of β can be obt ai ned by not i ng t hat
2 2 21 11
1- = 1- 1- y X βρ ρ ρ ε+
( ) ( )2 2121 2
= 1- + 1- Xβ ρ β ρ ( ) ( )ερρβ 12
k12
k-1 + X -1 + ... +
wher e
2 2 2 21 u1- ~ N[0, (1- ) = ]ερ ρε σ σ
and t hen appl yi ng l eas t squar es t o t he t r ans f or med equat i on
y** = X** β + ε*
wher e
2
1
2 1
3 1
n n-1
1- y
- y y
- y y
.
.
.
- y y
ρ ρ
ρ ρ
= T2
y
47I V
ρρρ
ρρρ
ρρρ
ρρρ
x-x...x-x-1
. ..
. ..
. ..
x-x...x-x-1
x-x...x-x-1
x -1...x-1-1
= *X*
k 1n-nk2 1n-2n
k2k32232
k1k21222
k12
1222
= T2
X
=
21 0 0 0
1 0 0
0 1
0
0 0 1
− ρ
−ρ −ρ −ρ
L
L
O
M O O O
X.
The t r ansf or mat i on mat r i ces T1
and T2
ar e r el at ed by
2
2
1
1 0 0T
T
− ρ=
L
Not e: ( 1) T2
i s n x n wher eas T1
i s n- 1 x n; hence, y** i s n x 1 and y*
i s n- 1 x 1.
( 2) I f al l n obser vat i ons ar e used, t hen a pr ogr am must be used
whi ch suppr esses es t i mat i on of an i nt er cept . Thi s i s because
t he f i r s t col umn of X** cont ai ns di f f er ent el ement s .
( 3) I f onl y t he l as t n- 1 obser vat i ons ar e used, t hen a r egr ess i on
pr ogr am whi ch es t i mat es an i nt er cept can be used and t he
48I V es t i mat e of β
1 can be r ecover ed by di vi di ng t he es t i mat ed
i nt er cept by 1- ρ.
( 4) I n cases i n whi ch ρ i s known t he above pr ocedur es ar e
r el at i vel y s t r ai ght f or war d. When ρ i s not known al t er nat i ve
t echni ques have been devel oped. A common t echni que can be
out l i ned as f ol l ows:
( a) Es t i mat e y = Xβ + ε us i ng OLS t o obt ai n y = Xβ + e.
Obt ai n an es t i mat e of ρ us i ng t he e vect or .
e
)e e(
= ˆ2t
n
1=t
1t-t
n
2=t
∑
∑ •
ρ
( b) Tr ansf or m t he dat a us i ng ρ i ns t ead of ρ. T1
or T2
can be
used. St at a al l ows t he use of T1
or T2.
( c) Appl y l eas t squar es t o t he t r ansf or med dat a. The
associ at ed es t i mat or s ar e r ef er r ed t o as t wo s t age
es t i mat or s . ( Don' t conf use t hese es t i mat or s wi t h t wo
s t age l eas t squar es whi ch wi l l be di scussed l at er ) .
( d) Maxi mum l i kel i hood es t i mat or s can be obt ai ned by us i ng t he
es t i mat e of β det er mi ned i n t he l as t s t ep, β*; cal cul at e
t he associ at ed er r or t er ms e* = y - Xβ*; cal cul at e a new
es t i mat e of ρ i n t er ms of e*; t r ans f or m t he dat a ( y, X) ;
r ees t i mat e β; r epeat t hi s pr ocess unt i l conver gence i s
achi eved.
Thi s pr ocess , whi l e concept ual l y s i mpl e, woul d be t edi ous t o per f or m
by hand. The St at a, TSP, SAS and SHAZAM pr ogr ams have been wr i t t en t o
aut omat i cal l y per f or m t hi s i t er at i ve es t i mat i on pr ocedur e.
The St at a “MLE” es t i mat i on can be per f or med as f ol l ows:
• tsset “t ype i n t he name of a “t i me” var i abl e
49I V
• prais depvar_rhs_vars ( per f or ms i t er at i ve MLE us i ng T2
assumi ng an AR( 1) model )
• prais depvar rhs_vars, corc ( per f or ms i t er at i ve “MLE”
us i ng T1 assumi ng an AR( 1) model )
• prais depvar rhs_vars, twostep ( s t ops t he pr ai s es t i mat i on
af t er t he f i r s t s t ep)
4. Unit roots and the Dickey-Fuller test
I n our di scuss i on of es t i mat i ng r egr ess i on model s wi t h aut ocor r el at ed
di s t ur bances we not ed t hat t he t r ansf or med r egr ess i on model wi t h an AR( 1)
er r or ,
( )1 1t t t t ty y X X uρ ρ β− −− = − + ,
was char act er i zed by uncor r el at ed er r or s . Not e t hat t hi s model s i mpl i f i es t o
t he r egul ar r egr ess i on model wher e 0ρ = wi t h OLS yi el di ng ef f i ci ent es t i mat or s .
I n t he pr evi ous sect i on we di scussed sever al t es t s of t he hypot hes i s
0 : 0H ρ = and how MLE can be obt ai ned when t he nul l
hypot hes i s i s r ej ect ed.
Anot her hypot hes i s of i nt er es t i s 0 : 1H ρ = t o check f or what ar e r ef er r ed t o
as uni t r oot s . Not e
i n t hi s case t he t r ansf or med equat i on becomes
( )1 1t t t t ty y X X uβ− −− = − + ,
wi t h t he cor r espondi ng es t i mat i on i nvol vi ng r egr ess i ng changes i n y on changes
i n x. Regul ar t -
t es t s can’t be used t o t es t f or uni t r oot s . The Di ckey- Ful l er t es t i s
des i gned f or t hi s case. Si mpl e
Di ckey- Ful l er t es t s can be per f or med by es t i mat i ng t he f ol l owi ng equat i ons and
t es t i ng f or
s t at i s t i cal s i gni f i cance of t he es t i mat ed θ :
50I V
( )1 1 1
1 1
1 =
t t t t t
t t t t
y y y u y or
y y t y u
α ρ α θ
α δ θ
− − −
− −
− = + − + +
− = + + +,
The nul l hypot hes i s 0 : 1H ρ = i s r ej ect ed i f θ ’s t - s t at i s t i c i s l ess t han t he
cr i t i cal val ues r epor t ed i n
t he f ol l owi ng t abl es , r espect i vel y,
Si gni f i cance
l evel
1% 2. 5% 5% 10%
Cr i t i cal val ue - 3. 43 - 3. 12 - 2. 86 - 2. 57
Si gni f i cance
l evel
1% 2. 5% 5% 10%
Cr i t i cal val ue - 3. 96 - 3. 66 - 3. 41 - 3. 12
5. Predictions
The expr ess i on obt ai ned by Gol dber ger f or t he bes t l i near unbi ased
pr edi ct or s i n t he case of AR( 1) er r or t er ms i s
yn+h
= Xn+h
∆
β+ W' Σ
- 1e
wher e
n h 1
n+h 2
u
2
h
n+h
W E1
+ − ε ε ρ σ ′ = = − ρ ε ε ρ
M M
2
1
2
u 2
1 0 0 0 0
1 0 0 01
0 0 0 1
0 0 0 0 e 1
−
−ρ
−ρ + ρ −ρ Σ =
σ −ρ + ρ −ρ
−
L
L
M
L
L
51I V Ther ef or e,
yn+h
= Xn+h
∆
β + ρh
en
Thi s mi ght gr aphi cal l y be depi ct ed as :
Not e t hat as we at t empt t o f or ecas t f ur t her i nt o t he f ut ur e, t he
adj us t ment f act or s , ρh
en
, appr oaches zer o and yn+h
appr oaches Xn+h
∆
β as
h → ∞ .
Xn Xn+1
tX∆
β
en
n 1 nˆ ˆe e+ = ρ
Xt
pr edi ct ed
val ue
52I V
V. G. Panel Data: an introduction
Panel dat a r ef er s obser vat i onal dat a on i ndi vi dual s ( i , i = 1, 2, . . . m)
over t i me ( t =1, 2, . . , iT ) ( t wo di mens i ons) and mi ght be denot ed as ( )itY .
The panel dat a set i s r ef er r ed t o as bal anced i f ever y i ndi vi dual i s
obser ved f or ever y poi nt of t i me,1 2 . . . mT T T T= = = = . Ot her wi se, t he
panel dat a set i s r ef er r ed t o as unbal anced. Obser vat i ons f or a gi ven
i ndi vi dual over t i me ar e t i me ser i es ; wher eas , cr oss sect i onal dat a ar e
obser vat i ons f or di f f er ent i ndi vi dual s at a gi ven poi nt i n t i me. I n many
appl i cat i ons , t he dat a ar e f or shor t per i ods of t i me, but i ncl ude many
i ndi vi dual s .
1. OLS and GLS (generalized least squares)
Model s f or panel dat a t ake a number of di f f er ent f or ms . Per haps t he
s i mpl es t r epr esent at i on i s gi ven by
it it itY X β ε= + (1)
wher e itX denotes a 1xk vect or of obser vat i ons on k- exogenous var i abl es
f or t he thi i ndi vi dual at t he
tht t i me per i od and wher e t he mar gi nal
i mpact of t he X’s on Y i s assumed cons t ant over i ndi vi dual s and t i me
( i ncl udi ng t he i nt er cept ) . Thi s speci f i cat i on i s somet i mes cal l ed t he
pooled model. Let t he model be r ewr i t t en i n mat r i x f or m as
1 1 1
2 2 2
. . .
. . .
m m m
y X
y X
y X
ε
ε
β
ε
= +
53I V
or
Y X β ε= +
OLS es t i mat es of β , ( )1ˆ ' 'X X X Yβ
−= , can be obt ai ned wi t h t he
command
reg y x’s or
reg y x’s, vce(robust, bootstrap, or jackknife)
Recal l , t hat i n t he pr esence of het er oskedas t i ci t y and/ or aut ocor r el t i on
GLS ( gener al i zed l eas t squar es es t i mat or s ) can pr ovi de mor e ef f i ci ent
es t i mat or s t han OLS. The f or mul as f or t he GLS es t i mat or s and
cor r espondi ng var i ance- covar i ance mat r i x ar e gi ven by
( )
( ) ( )
11 1
11
' '
'
X X X Y
Var X X
β
β
−− −
−−
= Ω Ω
= Ω
%
%
wher e ( ) V ar ε = Ω , i imxm T xT
IΩ = Σ ⊗ , iT m≥ .
I n or der t o obt ai n GLS ( gener al i zed l eas t squar es ) es t i mat or s ,
s i mpl i f yi ng assumpt i ons about t he var i ance of , ,ε Ω need t o be made and
t he nat ur e of t he l ongi t udi nal / panel dat a mus t be pr ovi ded t o St at a wi t h
t he “xtset” command as f ol l ows:
xtset panel_var or
xtset panel_var time_var
t o i ndi cat e t hat panel dat a ar e bei ng used wher e panel_var denot es t he
i ndi vi dual i dent i f i cat i on code or gr oup var i abl e and time_var i s an
i ndex whi ch r epr esent s t he t i me var i abl e whi ch def i nes t he panel s bei ng
used. Thi s i s s i mi l ar t o us i ng “t sset time_variable” t o al er t St at a t hat
t i me ser i es ar e bei ng used.
The “xt gl s” command can be used t o obt ai n var i ous gener al i zed
54I V
l eas t squar es es t i mat or s of β , dependi ng on t he f or m of t he var i ance-
covar i ance of t he er r or t er m.
I f t her e i s het er oskedas t i ci t y acr oss panel s ,
2
1
2
2
2
0 . . . 0
0 . . . 0
. . . .
. . . .
. . . .
0 0 . . .m
I
I
I
σ
σ
σ
Ω = ,
cor r espondi ng GLS es t i mat or s can be obt ai ned us i ng t he command
xtgls y x’s, panels(hetero)
I f t her e i s cor r el at i on acr oss panel s ( cr oss - sect i onal cor r el at i on) of
t he f or m
2
1 1,2 1,
2
2,1 2 2,
2
,1 ,2
. . .
. . .
. . . .
. . . .
. . . .
. . .
m
m
m m m
I I I
I I I
I I I
σ σ σ
σ σ σ
σ σ σ
Ω = ,
t he GLS es t i mat or i s obt ai ned wi t h t he command ( t hi s can onl y be appl i ed
t o bal anced panel s )
xtgls y x’s, panels(correlated)
The command
xtgls y x’s, igls
i t er at es t he gener al i zed l eas t squar es pr ocedur e unt i l conver gence i s
55I V
obt ai ned.
St at a al l ows f or aut ocor r el at i on wi t hi n t he panel s . The St at a
manual , ( Logni t udi nal / Panel Dat a, ver s i on 10, p. 150) s t at es t hat t hr ee
opt i ons ar e al l owed: ” cor r ( i ndependent ) or no aut ocor r el at i on, cor r ( ar 1)
( ser i al cor r el at i on wher e t he cor r el at i on par amet er i s common f or al l
panel s ) , or cor r ( psar 1) ( ser i al cor r el at i on wher e t he cor r el at i on
par amet er i s uni que f or each panel ) . ” A coupl e of obser vat i ons ar e i n
or der : ( 1) xt gl s y X’s , panel s ( i i d) cor r ( i ndependent ) i s equi val ent t o
r egr ess y X’s ; ( 2) when cor r ( ar 1) or cor r ( psar 1) ar e speci f i ed t he
i t er at ed GLS es t i mat or does not conver ge t o t he MLE.
Some exampl es and var i at i ons i ncl ude:
xtgls y x’s, panel(hetero)
xtgls y x’s, panels(correlated)
xtgls y x’s, panels(correlated) igls
xtgls y x’s, panels(hetero) corr(ar1)
xtgls y x’s,panels(iid) corr(psar1)
Testing for heteroskedasticity.
A l i kel i hood r at i o t es t f or het er oskedas t i ci t y acr oss panel s can be
per f or med by compar i ng t he l og- l i kel i hood val ues of MLE of t he
r egr ess i on model wi t h and wi t hout het er oskedas t i ci t y as f ol l ows:
xt gl s y x’s , i gl s panel s ( het er o)
es t i mat es s t or e het er o
xt gl s y x’s
l ocal df =e( N_m) - 1 ( t he number of panel s or gr oups –
1)
l r t es t het er o . , df ( ` df ’)
Testing for autocorrelation.
Wool dr i dge ( Economet r i c Anal ys i s of Cr oss Sect i on and Panel Dat a,
2002, 282- 283) out l i nes a t es t f or aut ocor r el at i on i n panel - dat a model s .
Davi d Dr ukker has wr i t t en a downl oadabl e pr ogr am t o per f or m t o per f or m
56I V
t hi s t es t .
findit xtserial
net sj 3-2 st0039 (or click on st0039)
net install st0039 (or click on click here to install)
xtserial y x’s
The underlying null hypothesis is no autocorrelation, so a significant value of the
test statistic provides evidence of autocorrelation.
2. Fixed and random effects specifications
The f i xed and r andom ef f ect s r epr esent at i ons ar e a l i t t l e di f f er ent
t han t he f or m j us t cons i der ed i n t hat t hey al l ow panel s t o have
di f f er ent i nt er cept s . I n par t i cul ar , t hey can be r epr esent ed as :
it it i itY X= β + α + ε
( 2)
wher e t he mar gi nal i mpact of changes i n t he X’s ar e s t i l l assumed t o be
cons t ant acr oss i ndi vi dual s , i . e. t he β ‘s ar e t he same f or each
i ndi vi dual . The onl y di f f er ence i n t he r el at i onshi p acr oss f i r ms i s i n
t he i nt er cept t er m. I n f i xed ef f ect s ( f e) model s t he iα ar e unknown
cons t ant s and i n r andom ef f ect s model s ( r e) model s t he iα ar e r andom.
OLS can be used t o es t i mat e t he unknown par amet er s i n t he f i xed ef f ect s
f or m wi t h bi nar y var i abl es bei ng added t o t he set of exogenous var i abl es
t o denot e t he i ndi vi dual .
St at a uses a s l i ght var i at i on on t hi s f or mul at i on i n es t i mat i on
i ivα α= +
57I V
wher e t he iv ar e es t i mat ed such t hat 0
i
i
v =∑ ;
hence, it it i itY X =α+ β+ ν +ε .
( 3)
Cons i der t aki ng t he f ol l owi ng aver ages of ( 3) :
i i i i (4) (average over i)y = x
y = x (5) (average over i & t)
α + β + ν + ε
α + β + ν + εw
Combi ni ng equat i ons ( 3) and ( 4) , ( 3) , ( 4) and ( 5) , r espect i vel y,
enabl es us t o wr i t e
( ) ( )i i iit it ity x Y X
− − − ε= β+ ε ( 6)
( ) ( )i i iit it i ity x (7) Y y X x
− + − + − ε += α + β + ν + ε ν + εw
STATA’s fixed effects ( within) es t i mat i on pr ocedur e, xtreg y x’s, fe,
cor r esponds t o es t i mat i ng β i n equat i on ( 6) or equat i on ( 7) as
addi ng i n t he over al l mean of y has no i mpact on t he es t i mat es of β .
Thr ee 2 'R s ar e r epor t ed:
Within: 2R f r om t he mean- devi at i on r egr ess i on, equat i on ( 6)
2 2ˆ( , )Between i i
R corr x yβ= , 2R f r om r egr ess i ng i on x
iy
Overall: 2 2 2ˆ ( , )Overall it it
R corr x yβ= , 2R f r om r egr ess i ng
on it it
y X , pool ed
r egr ess i on
Leas t squar es es t i mat i on wi t h a dummy var i abl e ( LSDV) f or t he
di f f er ent i nt er cept s i s equi val ent t o r unni ng a f i xed ef f ect s
58I V r egr ess i on. The hypot hes i s t hat t her e i s no het er ogenei t y i n t he f i xed
ef f ect s or t hat t he gr ouped ef f ect s ar e al l t he same, ( )0, all ii
forν = ,
can be t es t ed us i ng a Chow Tes t by compar i ng t he pool ed and LSDV
r egr ess i ons as f ol l ows:
( )( )
( ) ( )
2 2
LSDV Pooled
2
LSDV
R R m 1F m 1 mT m K
1 R mT m K
/( ),
/
− − − − − =
− − −
wher e m = number of gr oups and T = l engt h of t i me ser i es .
St at a’s between effects es t i mat or s can be obt ai ned by es t i mat i ng
equat i on ( 4) us i ng t he St at a command, xtreg y x’s, be. The same t he 2 'R s
r epor t ed wi t h f i xed ef f ect s es t i mat i on ar e r epor t ed f or t he bet ween
ef f ect s wi t h t he 2
BetweenR cor r espondi ng t o t he f i t t ed model wi t h t hi s
es t i mat i on pr ocedur e.
I n t he random effects model t he i
ν i n t he r egr ess i on model
it it i it
y Xα β ν ε= + + +
ar e assumed t o be di s t r i but ed i dent i cal l y and i ndependent l y wi t h mean
zer o and cons t ant var i ance. The t er m ( )i itν ε+ can be t hought of as a
compos i t e er r or t er m wi t h
( )2 2
.( ) = and Var +i
i i T u T T mVar I i i Iεα ε σ σ α ε+ = + Σ = Ω = ⊗ Σ
GLS i s t hen appl i ed t o obt ai n t he des i r ed es t i mat or s us i ng t he command,
xtreg y x’s, re.
I f t he i
ν ar e uncor r el at ed wi t h t he expl anat or y var i abl es , t hen
r andom ef f ect s es t i mat or s wi l l be ef f i ci ent , ot her wi se t hey wi l l be
i ncons i s t ent .
The f i xed ef f ect s es t i mat or i s appr opr i at e whet her t he dat a ar e
gener at ed by a f i xed ef f ect s model or a r andom ef f ect s model ; however ,
59I V i t i s mer el y l ess ef f i ci ent t han t he r andom ef f ect s es t i mat or i f t he
dat a gener at i ng pr ocess i s a r andom ef f ect s model . However , i f t he dat a
gener at i ng pr ocess i s a f i xed ef f ect s model , r andom ef f ect s es t i mat or s
wi l l yi el d i ncons i s t ent es t i mat or s . A Hausman t es t can be used t o t es t
t he nul l hypot hes i s t hat t he dat a ar e gener at ed by a f i xed ef f ect s
model .
I n summar y, t he St at a commands f or es t i mat i ng f i xed ( wi t hi n) ,
bet ween, and r andom ef f ect s model s , r espect i vel y, ar e gi ven by
xtset panel_var or xtset panel_var time_var
xtreg y x’s, fe
xtreg y x’s, be
xtreg y x’s, re
A Hausman test of the null hypothesis of fixed vs. random effects can be
performed using the commands:
xtreg y x’s, fe
est store fixed
xtreg y x’s, re
est store random
hausman fixed random
Some comments:
( 1) The command “xt r egar y x’s , r e or f e”can be used t o es t i mat e r andom
or f i xed ef f ect s ef f ect s model s when t he er r or t er m i s char act er i zed by
a f i r s t or der aut or egr ess i ve pr ocess .
( 2) Numer ous var i at i ons ar e poss i bl e, e. g. , cons i der
it it i t itY X =α+ β+ ν +γ +ε
whi ch al l ows f or cr oss - sect i onal ef f ect s and t i me cont r as t s .
( 3) xt sum [ var l i s t ] [ i f ] [ , i ( var name_i ) ] xt sum, i s a gener al i zat i on of
60I V summar i ze, r epor t s means and s t andar d devi at i ons f or cr oss - sect i onal
t i me- ser i es ( xt ) dat a; i t di f f er s f r om summar i ze i n t hat i t
decomposes t he s t andar d devi at i on i nt o bet ween and wi t hi n component s .
( 4) A speci al edi t i on of t he Jour nal of Economet r i cs ( edi t i t ed by
Bal t agi , Kel ej i an, and
Pr ucha( 140, 2007) f ocuses on an anal ys i s of spat i al l y dependent dat a
di scusses r el at ed i s sues of i dent i f i cat i on, es t i mat i on, and t es t i ng.
61I V
V. H. Stochastic Independent Variables
1. Introductory Remarks:
Whi l e t hi s assumpt i on i s l i s t ed l as t , i t may be t he mos t i mpor t ant of t he under l yi ng assumpt i ons because OLS es t i mat or s wi l l be bot h bi ased and i ncons i s t ent i f t he expl anat or y var i abl es ar e cor r el at ed wi t h t he er r or t er ms . Fur t her mor e, t hi s assumpt i on wi l l gener al l y be vi ol at ed i f t he speci f i ed model i ncl udes a r i ght hand s i de dependent var i abl e ( endogenous r egr essor ) whi ch i s qui t e common i n economi c model i ng. I n t hi s sect i on we wi l l cons i der a s i mpl e macr o model whi ch i ncl udes an endogenous r egr essor , i l l us t r at e how cons i s t ent es t i mat or s can be obt ai ned, and f i nal l y f or mal l y out l i ne why a cor r el at i on bet ween t he X’s and t he er r or s l eads t o bi ased and i ncons i s t ent es t i mat or s .
2. A simple example
The case of endogenous r egr essor s i s a common exampl e of s t ochas t i c r egr essor s i n economi c model s . For exampl e, cons i der t he s i mpl e macr oeconomi c s t r uct ur al model cons i s t i ng of a consumpt i on f unct i on and an account i ng i dent i t y:
Ct
= α + βYt
+ εt
Yt
= Ct
+ Zt
I n t hi s model , t he t wo dependent var i abl es ar e C and Y, t hus Y i s an endogenous r egr essor i n t he consumpt i on f unct i on. The OLS es t i mat or s of t he unknown par amet er s i n t he consumpt i on f unct i on ar e gi ven as f ol l ows:
α = C - β Y
( )( )
t t
2t
,( - Y)( - C)CYˆ = ( - Y)Y
Cov Y C
Var Yβ
∑=
∑
Sol vi ng t he s t r uct ur al model f or t he r educed f or m gi ves
ttt + + C Z
1- 1- 1-
α β ε=
β β β
tt t + + Y Z
1- 1- 1-
α β ε=
β β β
Not e: Yt
and εt
ar e not i ndependent s i nce cov ( Yt
, εt
) =β
σ
-1
2
as can
seen by not i ng
62I V
E ( ( Yt
- E( Yt
) ) ( εt
- E ( εt
) ) )
( )
ε
β
εt
t
-1 E =
2
2t= E( ) /1- = 0.
1-
σβ ≠ε
β
Fur t her mor e, we can show t hat
σσ
σβββ 22
Z
2
OLS +
)-(1 + = ˆplim
This is an example of the simultaneous equation problem where least squares
are biased and inconsistent.
3. Estimation, tests, and statistical inference
Sever al es t i mat i on appr oaches t o ci r cumvent i ng t hi s pr obl em ar e
avai l abl e and wi l l be di scussed i n mor e det ai l i n anot her sect i on.
Two common es t i mat or s whi ch yi el d cons i s t ent es t i mat or s ar e t wo- s t age l eas t squar es and i ns t r ument al var i abl es . The St at a f or mat f or t he
t wo s t age l eas t squar es es t i mat or i s
i vr egr ess 2s l s l hs_dep_var ( r hs_dep_var s=i ns t r ument s ) r hs_i nd_var s
wher e l hs_dep_var denot es t he l ef t hand s i de dependent var i abl e,
r hs_dep_var s t he r i ght hand s i de dependent var i abl es or endogenous
r egr essor s , and r hs_i nd_var i abl es r epr esent s t he r i ght hand s i de
i ndependent var i abl es . The i ns t r ument al var i abl es , or i ns t r ument s , ar e var i abl es whi ch ar e assumed t o be ( 1) cor r el at ed wi t h t he
endogenous r egr essor ( s ) and ( 2) i ndependent of t he er r or t er m. Ther e needs t o be at l eas t as many i ns t r ument s as endogenous r egr essor s .
An F or t - t es t can be appl i ed t o a r egr ess i on of t he endogenous r egr essor ( s ) on t he i ndependent var i abl es and i ns t r ument s t o t es t
whet her t he i ns t r ument al var i abl es ar e s i gni f i cant l y cor r el at ed wi t h t he endogenous r egr essor . Thi s can be per f or med wi t h St at a’s reg
command as reg rhs_dep_var instruments rhs_ind_vars
or by addi ng t he opt i on first t o t he ivregress as
I V 43 ivregress 2sls lhs_dep_var (rhs_dep_vars=instruments) rhs_ind_vars,first
A compar i son of t he i ns t r ument al var i abl es ( 2SLS) and OLS es t i mat es obt ai ned f r om t he command reg lhs_dep_var rhs_vars, pr ovi des t he bas i s f or t es t i ng whet her t he r i ght hand s i de endogenous var i abl e i s cor r el at ed wi t h t he er r or t er m. These t es t s can be i mpl ement ed us i ng ei t her a Hausman or Wool r i dge t es t as f ol l ows:
Hausman test: Es t i mat e t he equat i on us i ng OLS and 2s l s ( al t er nat i ves can be used) . Then check f or s t at i s t i cal di f f er ences bet ween t he t wo es t i mat or s us i ng a Hausman t es t .
r eg l hs_var r hs_var s
es t s t or e OLS i vr egr ess 2s l s l hs_dep_var ( r hs_dep_var s=i ns t r ument s )
r hs_i nd_var s
es t s t or e 2s l s
hausman 2s l s ol s
Wooldridge test: Regr ess t he r i ght hand s i de endogenous var i abl es
i n a r egr esson model on al l of t he exogenous var i abl es ( t hose i n t he r egr ess i on model and t he i ns t r ument al var i abl es ) and save t he
cor r espondi ng r es i dual s . Es t i mat e t he or i gi nal r egr ess i on model
wi t h t he es t i mat ed r es i dual s i ncl uded as r egr essor s . Tes t t he
s t at i s t i cal s i gni f i cance of t he coef f i ci ent s of t he r es i dual s .
The es t i mat ed coef f i ci ent s of t he or i gi nal var i abl es shoul d be i dent i cal t o t he 2SLS es t i mat es .
Appl yi ng t hese met hods t o t he s i mpl e consumpt i on f unct i on can be accompl i shed wi t h t he St at a commands
r eg c y OLS es t i mat es of t he consumpt i on f unct i on
es t s t or e OLS pr edi ct e, r es i d
i vr egr ess 2s l s c ( Y=z) 2s l s es t i mat es of t he consumpt i on f unct i on es t s t or e 2s l s
hausman 2s l s OLS Per f or ms a Hausman t es t r eg c y e Per f or ms a Wool dr i dge t es t
I V 44 The s t at i s t i cal s i gni f i cance of t he coef f i ci ent s of t he r es i dual s woul d be t es t ed us i ng a chi squar e, F or t - t es t . Not e t hese di s t r i but i ons ar e asympt ot i c and woul d not be expect ed t o be exact f or f i ni t e sampl es .
4. Formal analysis
Assumpt i on A. 5 i n t he s t andar d model s t at es :
( a) Xt
i s nons t ochas t i c.
( b) Val ues of X ar e f i xed i n r epeat ed sampl es .
( c) XXn
1(X, X) = limit
n→∞∑ i s f i ni t e and nons i ngul ar .
Assumpt i ons ( a- b) ar e pr i mar i l y of t heor et i cal i nt er es t s i nce, at
l eas t wi t h economi c dat a, we can r ar el y “dr aw” t he same set of X' s
or sel ect a pr edet er mi ned val ue f or X. These assumpt i ons , ( A. 5 a-
c) , pr ovi de a r el at i vel y s i mpl e bas i s t o begi n our anal ys i s of
r egr ess i on t heor y. Assumpt i on ( c) i s usef ul i n pr ovi ng cons i s t ency of L. S. es t i mat or s .
a. Case 1 of relaxing (A.5)
( A. 5) ' ( a) Xt
i s s t ochas t i c
( b) Xt
and εt
ar e s t ochas t i cal l y i ndependent .
© XXn
1(X, X) = limit
n→∞∑ i s f i ni t e and nons i ngul ar .
β = ( X’X)- 1
X’y = ( X’X)- 1
X’( Xβ + ε)
= β + ( X’X)- 1
X’ε
E( β ) = β + E( X’X)- 1
X’E( ε)
I V 45
= β, t her ef or e β i s unbi ased.
Var ( β ) = E( β - β) ( β - β) ’ = E( X’X)- 1
X’εε’X( X’X)- 1
= σ2
E( X’X)- 1
Rel axi ng t he assumpt i on t hat X i s nons t ochas t i c and r epl aci ng i t wi t h t he assumpt i on t hat X i s s t ochas t i c and i ndependent of ε does not al t er t he des i r abl e unbi asedness and cons i s t ency pr oper t i es of OLS.
b. Case 2 of relaxing (A.5)
( A. 5) ' ' ( a) Xt
i s s t ochas t i c.
( b) Xt
and εt
ar e s t ochas t i cal l y dependent and cov ( Xt
, εt
) ≠ 0
( c) XXn
1(X'X) = limit
n→∞∑ i s f i ni t e and nons i ngul ar .
E( β ) = β + E( X' X)- 1
X' ε
≠ β;
Ther ef or e, t he l eas t squar es es t i mat or i s biased.
pl i m ( β ) = β + pl i m ( X' X)- 1
X' ε
)Cov(X + = -1XX ε∑β
≠ β t her ef or e inconsistent.
Thus , i t i s t he cor r el at i on bet ween t he r egr essor s and er r or s
whi ch l eads t o es t i mat or bi as and i ncons i s t ency.
I V
51
IV. I. Errors of Measurement
An assumpt i on whi ch has been made i n t he devel opment t o t hi s poi nt i s
t hat t he i ndependent and dependent var i abl es cont ai ned i n our hypot hes i zed f or mul at i ons ar e measur ed wi t hout er r or . I n many cases t hi s i s ext r emel y unr eal i s t i c. I f t he i ndependent and dependent var i abl es ar e measur ed wi t h er r or , t hen t he l eas t squar es es t i mat or s need not possess t he des i r abl e s t at i s t i cal pr oper t i es di scussed ear l i er .
l. Theoretical Development
Assume t hat t he r el at i onshi p
( 1) y = Xβ + ε
wher e ε ~ N( 0, Σ = σ2I )
i s hypot hes i zed t o hol d wher e y and X r epr esent "t r ue" val ues .
Al so assume t hat y and X ar e measur ed wi t h er r or as y* and X*,
r espect i vel y, wher e
( 2. a) y* = y + u u ~ N( O, Σu
)
( 2. b) X* = X + V V ~ N( O, Σv
)
and t he measur ement er r or s u and V ar e i ndependent .
Maki ng use of ( 2) we can r ewr i t e ( 1) i n t er ms of observed variables, y*,
X*.
y* - u = ( X* - V) β + ε
( 3) y* = X*β + ε + u - Vβ
( 3) ' y* = X*β + η
wher e η = u - Vβ.
Appl yi ng l eas t squar es t echni ques t o ( 3) ' yi el ds
( 4) β = ( X*' X*)- 1
X*' y*
or
( 4) ' β = [ X' X + V' X + X' V + V' V]- 1
[ X' y + X' u + V' y + V' u]
I s t hi s es t i mat or unbi ased and cons i s t ent ?
I V
52
Fr om ( 4) ' we can wr i t e
-1X X V X X V V V X y X u V y V uˆ = + + + + + +
n n n n n n n n
′ ′ ′ ′ ′ ′ ′ ′ β
and s i nce X' y = X' Xβ + X' ε we can use Sl ut sky' s t heor em t o obt ai n
( ) ( )-1XX VX XV VV XX X Xu Vy Vu
n
ˆ = + + + + + + + plim ε→∞
β β∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑
s i nce ΣXV = ΣVX = 0, ΣVy = ΣVu = 0, ΣXε = 0, and ΣXu = 0.
Note: (1) As long as the independent variables are measured with error (ΣΣΣΣvv
≠≠≠≠ 0) ,
the least squares estimator of β is inconsistent
( )
( )
-1
XX VV XX
11
= +
XX VVI
β
β−−
∑ ∑ ∑
= + Σ Σ
f( ββββi)
I V
53
(2) If the dependent variable is measured with error, but the independent
variables are "error free" (ΣΣΣΣVV
= 0), (3) can be rewritten as
y* = X*β + ε + u
η
wher e η wi l l sat i s f y ( A. 1) - ( A. 4) as l ong as ε and u do.
Not e: I n t hi s case n
plim→∞
β = β and, gi ven t he X' s , t he associ at ed
β wi l l be unbi ased, mi ni mum var i ance, ef f i ci ent , asympt ot i cal l y
unbi ased, cons i s t ent , and asympt ot i cal l y ef f i ci ent . I t shoul d be
not ed t hat t he var i ance of ηt
, σ2
η = σ2
ε + σ2
u wi l l be l ar ger t han i f
t he dependent var i abl e was measur ed wi t hout er r or .
2. An Example. M. Friedman suggested that consumption and income can be partitioned
into "permanent" and "transitory" components as follows:
c = c
p + c
T
y = yp
+ yT
He al so sugges t s t hat t he "per manent " consumpt i on f unct i on i s of t he f or m
cp
= kyp
+ εT
I f t he "per manent " mar gi nal pr opens i t y t o consume, k, i s es t i mat ed us i ng
l eas t squar es appl i ed t o c and y dat a, we have an exampl e of an er r or i n var i abl es model and our r esul t ant es t i mat e of k wi l l , i n t he l i mi t as
n→∞, pr ovi de an under es t i mat e of t he "t r ue" k.
)y + y(
)y + y)(c + c( =
y
cy = k
2
Tp
TpTp
2 ∑
∑
∑
∑
)y + y(
)y + y)(c + + ky(=
2
Tp
TpTTp
∑
ε∑
y + yy2 + y
yc + yc + y + y + )yy(k + yk=
2
TTp
2
p
TTpTTTpTTp
2
p
∑∑∑
∑∑ε∑ε∑∑∑
I V
54
p
p T
2y
2 2n y y
kˆ k = plim
+ →∞
σ
σ σ
wher e σ2
yp and σ2
yT r espect i vel y, denot e Var ( y
p) and Var ( y
T) .
3. Estimation. (ΣΣΣΣVV
≠≠≠≠ 0)
a. Met hod of i ns t r ument al var i abl es .
Sel ect z
t i' s whi ch ar e uncor r el at ed wi t h t he measur ement er r or s and
ar e cor r el at ed wi t h t he xt i
' s .
y = Xβ + ε
( )( ) ( )1
1 1
( )ˆ = ' ' ' ' ' '
ZX Z Z Z Z X X Z Z Z Z Yβ
−
− −
wi l l be a cons i s t ent es t i mat e of β
I V
55
IV. J. Specification Error
A speci f i cat i on er r or i s sai d t o have occur r ed whenever a r egr ess i on
equat i on or under l yi ng assumpt i on i s i ncor r ect . Speci f i cat i on er r or s can t ake many f or ms:
( 1) del et i ng a "r el evant " var i abl e,
( 2) i ncl udi ng an "i r r el evant " var i abl e,
( 3) us i ng an i ncor r ect f unct i onal f or m, or
( 4) speci f yi ng an i ncor r ect descr i pt i on of t he popul at i on f r om whi ch
t he di s t ur bance was dr awn.
For someone t o cl ai m t hat a speci f i cat i on er r or has been made car r i es
wi t h i t some sugges t i on t hat t he i ndi vi dual knows what t he "t r ue" model i s
l i ke. Speci f i cat i on er r or s i nvol vi ng ques t i ons about f unct i onal f or m or
t he er r or di s t r i but i on have al r eady been di scussed. We now cons i der t he consequence of ( 1) del et i ng a r el evant var i abl e and ( 2) i ncl udi ng an
i r r el evant var i abl e.
1. Example. Deletion of "relevant" variables
Tr ue Model : y
t = β
1 + β
2x
t 2 + . . . + β
k1
xt k
1 + . . . β
kx
t k + ε
t
( 1) y = XIβ
I + X
I Iβ
I I + ε
Hypot hes i zed Model :
( 2) y = X
Iβ
I + η [ Not e: η = ε + X
I Iβ
I I]
An appl i cat i on of l eas t squar es t o ( 2) yi el ds
( 3) βI
= [ XI
' XI
]- 1
XI
' y
Repl aci ng y i n ( 3) by ( 1) r esul t s i n t he f ol l owi ng expr ess i on f or βI
.
I V
56
( 4) βI
= [ XI '
XI
]- 1
XI
' [ XIβ
I + X
I Iβ
I I + ε]
= [ XI
' XI
]- 1
XI
' XIβ
I + [ X
I' X
I]
- 1 X
I' X
I Iβ
I I + [ X
I' X
I]
- 1 X
I' ε
= βI
+ ( XI
' XI
)- 1
XI
' XI Iβ
I I + [ X
I' X
I]
- 1 X
I' ε
Anal ys i s of t he pr oper t i es of t he l eas t squar es es t i mat or βI i n t he
mi sspeci f i ed model :
a. E( βI
) = βI
+ E[ XI
' XI
]- 1
XI
' XI Iβ
I I + E[ X
I' X
I]
- 1X
I' ε
I f XI
and ε ar e i ndependent , t hen
E( βI
) = βI
+ E[ XI
' XI
]- 1
XI
' XI Iβ
I I
i . e. , βI
i s a bi ased es t i mat or of βI i f X' IX' IIβII ≠ 0
b. ( )-1
I II III III
1X Xˆplim = + plim plim X Xn n
′′ β ββ
-1
I II
1X X+ plim plim X
n n
′′ ε
( E. 5) X XI II I II
-1 -1
I IIX X X= + + εβ β∑ ∑ ∑ ∑
2. Example. Including an irrelevant variable.
Tr ue Model : y = XIβ
I + η
Hypot hes i zed Model : y = X
Iβ
I + X
I Iβ
I I + ε
To summar i ze, del et i ng a r el evant var i abl e r esul t s i n an i ncons i s t ent es t i mat or of βI unl ess
a) IX 0ε =∑ ( ε and XI ar e i ndependent )
and
b) I IIX X 0=∑ ( XI and XII ar e or t hogonal )
I V
57
= Xβ + ε
The l eas t squar es es t i mat or of
β
ββ
II
I
= i s t hen gi ven by
I -1
II
ˆˆ = = (X X X y)
ˆ
β′ ′β
β
-1
II I I II
II I II II II
yXX X X X=
yX X X X X
′′ ′ ′ ′ ′
Taki ng expect ed val ues gi ves -1
I I I I II I
II II I II II II
ˆX X X X X
E = E(y)ˆ
X X X X X
′ ′ ′ β
′ ′ ′β
( )
-1
I I I II I II II
II I II II II
X X X X X= X X
0X X X X X
′ ′ ′ β ′ ′ ′
-1
I I I II I I I II 1
II I II II II I II II
X X X X X X X X=
0X X X X X X X X
′ ′ ′ ′ β ′ ′ ′ ′
β
0 =
1.
The r eason f or t he asymmet r y of t he r esul t s f or t he t wo cases of
speci f i cat i on er r or j us t cons i der ed i s t hat t he hypot hes i zed model i ncl udes
t he "t r ue" model as a speci al case i n t he second exampl e, but does not i n t he
f i r s t exampl e. I t woul d t hen appear t hat i t woul d be bet t er t o er r or i n t he
di r ect i on of i ncl udi ng t oo many var i abl es t han del et i ng a r el evant var i abl e.
Ther ef or e, i ncl udi ng i r r el evant var i abl es i n a l i near r egr ess i on does not af f ect t he unbi asedness nor t he cons i s t ency of t he l eas t squar es es t i mat or s .
I V
58
I t shoul d be ment i oned t hat whi l e t he l eas t squar e es t i mat or of βI i n t he
second exampl e i s unbi ased and cons i s t ent , t he cor r espondi ng var i ance may be
l ar ger t han i s associ at ed wi t h es t i mat i ng t he "t r ue" model us i ng l eas t
squar es .
V. K. PROBLEM SET 5
Violations of the Basic Assumptions
Theory
1. Di s t r i but i onal assumpt i ons
a. Assume t hat t he pr obabi l i t y dens i t y f unct i on of t he r andom di s t ur bances εt i n a r egr ess i on equat i on
Yt = Xtβ + εt i s gi ven by gener al i zed er r or
di s t r i but i on ( GED) :
pt-(| |/ )
t
e( ; , ) =
2 (1 + 1/p)GED p
σε
σεσΓ
wher e Γ( ) i s t he gamma
f unct i on.
( 1) Obt ai n an expr ess i on f or t he l i kel i hood f unct i on and al so f or
t he l og l i kel i hood f unct i on cor r espondi ng t o t he r egr ess i on model
wi t h a GED er r or di s t r i but i on.
( 2) What woul d t he MLE of β be i f p i n t he GED i s
( a) p=1
( b) p=2
Hi nt : You don’t have t o der i ve an equat i on f or β ; however , i n
maxi mi zi ng t he l og- l i kel i hood f unct i on over β f or a gi ven
val ue of p you shoul d get β ’s you have seen bef or e. What ar e
t hey?
( 3) Bonus : How coul d t he par amet er "p" be es t i mat ed?
b. For t he dat a, HBJ . dat , es t i mat e t he model t t tY X= α + β + ε , us i ng
OLS and LAE, and t es t t he di s t r i but i onal assumpt i on of
nor mal i t y, i n par t i cul ar :
I V
59
( 1) r epor t t he es t i mat ed i nt er cept and s l ope us i ng OLS and LAE; ( 2) t es t t he nor mal i t y assumpt i on us i ng t he es t i mat ed skewness ,
kur t os i s us i ng a “Z- s t at i s i t c; ” and ( 3) t es t t he nor mal i t y assumpt i on us i ng t he JB t es t . ( Hi nt : You
can use t he St at a command skt es t f or ( 2) and ( 3) . )
2. I t was shown i n ( I V. C) t hat
µXX)X(β)β(E -1 ′′+=
wher e µ = E( ε) . I t was al so ment i oned t hat i f E( εt) = µ f or al l t t hen
E( β 1) = µ + β1 and E( β i) = βi f or i = 2, 3, . . . , K. Ver i f y t hat t hi s i s
t r ue f or t he case K = 2. Hi nt :
µ
′β
1
.
.
.
1
X)X(X, = )ˆ( Bias1-
and
XN - XN
1
NXN-
XN-X = )XX(
222
2t
1-
∑
∑′
.XN
N =
X
N =
1
.
.
.
1
Xt
∑
′
I V
60
3. Cons i der t he speci al case of t he gener al i zed r egr ess i on model wher e Σ = σ2I . For t hi s case, demons t r at e t hat
a. -1-1 -1 = = (X X X Y)
∆
′ ′β β ∑ ∑% s i mpl i f i es t o β = ( X' X)- 1
X' Y ,
b. Var ( β ) = ( X' X)- 1
X' Σ X( X' X)- 1
= σ2( X' X)- 1
, and
c.-1 -1-1 2Var( ) = Var( ) = (X X = (X X) )
∆
′ ′β β ∑ σ%
I V
61
4. Het er oskedas t i ci t y
a. Us i ng t he HBJ dat a and t he mar ket model t t tY X= α + β + ε
( 1) Tes t f or het er oskedas t i ci t y us i ng t he f ol l owi ng St at a
commands :
. whi t et s t . es t at het t es t x, i i d or es t at het t es t r hs , i i d . es t at het t es t x, f s t at
( 2) Wi t h t he wei ght s di scussed i n cl ass , use var i ance wei ght ed
l eas t squar es ( vwls) t o es t i mat e α and β. Tur n i n your comput er
commands and out put al ong wi t h your di scuss i on of t he r esul t s .
b. For t he het er oskedas t i c case ver i f y t hat
T' T = Σ- 1
.
5. For t he case of f i r s t or der aut ocor r el at i on i t can be shown t hat
ρ
ρ−
ρ+ρ
ρρ+ρ
ρ−
σ=Σ−
1
-
0
0
0
0
0
0 0 0 0
1-0
-1-
01
12
2
2
u
1
M
OOM
O
L
L
Eval uat e T1' T1 and T2' T2 and compar e each r esul t wi t h Σ- 1
comment i ng on t he
r el at i onshi p and expl ai ni ng any di f f er ences . Ref er t o t he cl ass not es f or
t he def i ni t i ons of t he t r ans f or mat i on mat r i ces T1 and T2 . The Cochr an-
I V
62
Or cut t es t i mat or cor r esponds t o del et i ng t he f i r s t obser vat i on wher eas t he
Pr ai s - Wi ns t en ( PW) es t i mat or uses al l obser vat i ons .
I V
63
Applied
6. Use t he dat a i n PHI LLI PS. RAW t o answer t hese ques t i ons .
a. Us i ng t he ent i r e dat a set , es t i mat e t he s t at i c Phi l l i ps cur ve equat i on
t 0 1 t tuneminf = β + β + ε by OLS and r epor t t he r esul t s i n t he usual f or m.
b. Obt ai n t he OLS r es i dual s f r om par t ( a) and obt ai n t he ρ f r om
r egr ess i ng te on t 1e − . I s t her e s t r ong evi dence of aut ocor r el at i on? Al so
t es t f or t he pr esence of aut ocor r el at i on us i ng t he DW t es t s t at i s t i c.
c. Now es t i mat e t he s t at i c Phi l l i ps cur ve model by i t er at i ve Pr ai s -
Wi ns t en. Compar e t he es t i mat e of 1β wi t h t hat obt ai ned i n Tabl e 12. 2.
d. Rat her t han us i ng Pr ai s - Wi ns t en, use i t er at i ve Cochr ane- Or cut t . How
s i mi l ar ar e t he f i nal es t i mat es of ρ ? How s i mi l ar ar e t he PW and CO
es t i mat es of 1β ? ( Wool dr i dge, C. 12. 10)
7. Cos t s of Pr oduct i on
The f ol l owi ng dat a cor r espond t o ouput ( Q) and t ot al cos t s ( C) of
pr oduct i on.
Out put Tot al Cos t s ( $) 1 193
2 226 3 240
4 244 5 257
6 260 7 274 8 297
I V
64
9 350 10 420 a. Use OLS t o es t i mat e t he par amet er s i n t he r el at i onshi p
1 2t t tC Qβ β ε= + +
b. Per f or m a t es t t o see i f t he er r or t er ms ar e “cor r el at ed. ”
c. I ndi cat e how you can obt ai n mor e appr opr i at e es t i mat or s t han OLS
es t i mat or s of t he l i near equat i on i n ( a) . Show your wor k and pr ovi de mot i vat i on f or your appr oach. ( Be car ef ul ! ! ! ! ! )
8. Panel dat a exer ci se
Cons i der t he f ol l owi ng dat a:
t code x y d1 d2 d3 d4
1 1 0 - 5 1 0 0 0
2 1 8 23 1 0 0 0
3 1 14 44 1 0 0 0 4 2 10 29 0 1 0 0
5 2 16 26 0 1 0 0
6 3 4 17 0 0 1 0
7 3 11 17 0 0 1 0
8 3 5 31 0 0 1 0 9 4 18 50 0 0 0 1
10 4 5 26 0 0 0 1 11 4 2 17 0 0 0 1
Per f or m t he f ol l owi ng St at a commands and br i ef l y expl ai n t he cor r espondi ng out put s .
xt set code
r eg y x
r eg y x d1 d2 d3 xt r eg y x, f e
xt r eg y x, be xt r eg y x, r e
I V
65
9. Cons i der t he f ol l owi ng model :
( ) 1 2i i in wage educβ β ε= + +l
wher e wage and educ, r espect i vel y, denot e t he wage and educat i on l evel ( year s ) f or t he i t h i ndi vi dual .
a. Under what condi t i ons woul d you expect t he OLS es t i mat or s of t he
'i
sβ t o be unbi ased and cons i s t ent ? Def end your answer .
b. I f you t hi nk t hat t he wage r at e has an i mpact on educat i on as wel l as
educat i on i mpact i ng wages , wi l l t he OLS es t i mat or s be unbi ased and
cons i s t ent ? Def end your answer .
c. I f t her e i s an endogeni et y pr obl em i n t he model , expl ai n how you coul d
obt ai n cons i s t ent coef f i ci ent es t i mat or s .
d. Us i ng t he mr oz dat a ( mr oz. dt a) es t i mat e t he gi ven model us i ng OLS and i ns t r ument al var i abl es es t i mat or s ( wi t h mot her ’s educat i on as an i ns t r ument ) . Whi ch es t i mat e woul d you r ecommend? Use a Hausmann
t es t t o suppor t your answer .
V 1
James B. McDonald Brigham Young University 2/8/2010
VI. SIMULTANEOUS EQUATION MODELS
INTRODUCTION
There are several problems encountered with simultaneous equations models that which are
not generally associated with single equation models. These include (1) the identification
problem, (2) inconsistency of ordinary least squares (OLS) estimators, (3) questions about the
interpretation of structural parameters, and (4) the validity of the OLS "t statistics" associated
with structural coefficients.
To introduce these problems, we review two important papers. The paper on identification
by E. J. Working [1927, QJE] is considered in the first section. The work of Haavelmo [1947,
JASA] dealing with alternative methods of estimating the marginal propensity to consume is
described in the second section. The third section contains a brief summary.
1. STRUCTURAL AND REDUCED FORM REPRESENTATIONS,
IDENTIFICATION, AND INTERPRETATIONS OF COEFFICIENTS
Consider the problem of estimating the impact of an increase in the price of crude oil upon
the equilibrium price and quantity of gasoline. The corresponding increase in the price of
gasoline will depend upon several factors including the slope of the demand curve.
V 2This is illustrated by the following figure:
Figure 1
Assume that (Q
0, P
0) denotes the original equilibrium. Assume that the increase in the price of
crude oil results in the supply curve shifting from S1
to S2
. The associated change in P depends
upon the relevant demand schedule, with the more inelastic schedule being associated with the
larger price increases. This example clearly indicates the importance of estimating the slope of
the demand schedule to make predictions about the impact of changes in factor price upon the
equilibrium price.
Estimation of the slope of the demand curve might begin by collecting observations on (P,
Q), which might appear as in Figure 2.
V 3 P
•
• •
•
• •
Q
Figure 2 The reader would probably be tempted to draw a line through the points or perform a least
squares estimation on p = β1
- β2
Q in order to estimate the demand schedule. But how would we
estimate the demand curve if a plot of P and Q appeared as in Figure 3 rather than as in Figure 2?
P
• •
• •
• •
• •
• •
• •
Q
Figure 3
The data in Figure 3 appears to define a supply curve rather than a demand curve.
Alternatively, how could we estimate a demand curve if the data appeared as in Figure 4?
V 4 P
•
• • •
• •
•
•
• • •
• •
• •
Q
Figure 4
To answer this question, we need to recall that equilibrium price and quantity are
determined by supply and demand factors and not supply or demand alone. The observations
depicted in Figure 2 could have been generated by either of the following scenarios:
P P
Q Q
Figure 5
V 5If the demand curve is stable and the supply curve shifts, then the demand curve is "traced
out." If both curves shift, fitting a relationship to the observed (P,Q) would not correspond to the
underlying demand curve(s). Similarly, Figure 3 could correspond to a relatively stable supply
curve and a shifting demand curve or both curves shifting. Figure 4 would appear to correspond
to both curves shifting.
Consider the following model:
(1.1) Demand: Q = γ11
- β12
P + γ12
Y + εlt
(1.2) Supply: Q = γ21
+ β22
P - γ23
FC + ε2t
or equivalently,
t111 1212 tt
t t 221 2322t
10-1 - Q
+ + = 0Y0-1 P
FC
γ γβ ε γ −γβ ε
.
Equations (1.1) and (1.2) will be referred to as the structural model with Q and P as endogenous
(dependent) variables and income (Y) and factor costs (crude oil, FC) as exogenous
(independent) variables. In order to draw a demand curve or supply curve using (Q, P) as
coordinates, Y and FC must be fixed at some arbitrary level.
P
S (FC = 125)
D (Y = 100)
Q
Figure 6
V 6
A change in factor costs (income fixed) will shift the supply curve and “trace” the depicted
demand curve and a change in income (factor costs fixed) will shift the demand curve and “trace”
the depicted supply curve, et cet. paribus. It is interesting to observe that by including factor
costs (FC) in the supply equation and not the demand equation we are able to "identify" the
demand equation. Similarly, by including income (Y) in the demand equation and not in the
supply equation we are able to "identify" the supply equation. Hence, one way of "identifying" a
structural equation is by excluding variables from the equation we want to estimate that are
included in other structural equations. This is the general approach to the identification problem
developed by E. J. Working [1927]. A more formal development will be considered later.
We note from Figure 6 that for each level of factor costs and income there is a
corresponding equilibrium price and quantity determined by the intersection of the supply and
demand curves. If we solve the structural model for the explicit relationship between (P, Q) and
FC and Y we obtain
ε
ε
γγ
γγ
β
β
+
FC
Y
1
-0
0
1-
-1- - =
P
Q
2t
1t
t
t
2321
1211
22
12
1-
t
t (1.3a-c)
ε
ε
γγ
γγ
ββ
ββ +
FC
Y
1
-0
0
1-1
+
1 =
2t
1t
t
t
2321
12111222
2212
γγγγ
γβγβγβγβ
ββFC
Y
1
-
- +
+
1 =
t
t
23122111
2312122221121122
2212
ββ
εε
ββ
εβεβ
2212
2t1t
2212
2t121t22
+
-
+
+
+
V 7
η
η
πππ
πππ
2t
1t
t
t
232221
131211
+
FC
Y
1
=
Note: 0 < = +
- =
FC
Q 0, > =
+ =
Y
Q13
2212
231212
2212
1222π
ββ
γβ
∂
∂π
ββ
γβ
∂
∂
0 > = +
= FC
P 0, > =
+ =
Y
P23
2212
2322
2212
12π
ββ
γ
∂
∂π
ββ
γ
∂
∂
Equations (1.3a-c) are referred to as the reduced form equations for Q and P corresponding to the
structural model defined by (1.1) and (1.2). Note that each reduced form equation expresses the
equilibrium value (P or Q) as a function of the exogenous variables FC and Y.
To determine the impact of an increase in the price of crude oil upon the price of gasoline,
we employ the reduced form representation, i.e.,
0 > = +
= FC
P23
2212
23π
ββ
γ
∂
∂
which takes into account the slopes of the supply and demand curves as well as how far the
supply curve would shift in response to an increase in the price of crude oil. The
equilibrium quantity would also change according to
0. < = +
- =
FC
Q13
2212
2312π
ββ
γβ
∂
∂
The reader might wonder why
0 < - = FC
Q23
s
γ∂
∂
doesn't characterize the change in equilibrium quantity.
V 8The following figure will illustrate why the reduced form provides the necessary information.
P Q
← → -γ
23∆FC
Taking the partial derivative of the supply equation with respect to FC assumes that P is
fixed and hence merely represents the horizontal shift of the supply curve and not the change in
equilibrium quantity. The reduced form equation for Q expresses the equilibrium quantity as a
function of FC and Y and takes account of the increase in equilibrium price associated with an
increase in factor costs.
To summarize, the reduced form coefficients represent the change in equilibrium values
corresponding to changes in the predetermined or exogenous variables, i.e., the reduced form
coefficients are the multipliers. The structural coefficients represent slopes or shifts of structural
schedules in response to changes in predetermined or exogenous variables.
∆ββ
γβ
+
-
2212
2312 FC
V 9OPTIONAL EXERCISES: 1. The Asymptotic Bias of the OLS estimator of the slope for the demand curve is given by
FC)) (Y,COR - (1 + +
) + (22
232
221
211222
γσσ
σββ
εε
ε
where COR(Y, FC) = correlation between Y and FC.
(a) Mathematically analyze the impact of increases in σε2
2, γ
232
, and COR(Y, FC) upon
the asymptotic bias of β12
.
(b) Graphically analyze the impact of increases in σε2
2, γ
232
, and COR(Y, FC) upon the
"identifiability of β12
."
V 10
2. INCONSISTENCY OF STRUCTURAL ORDINARY LEAST SQUARES
ESTIMATORS, ALTERNATIVE ESTIMATORS, AND STATISTICAL
INFERENCE
Haavelmo [1947] considered the following simple macro model:
(2.1) Ct = α + βY
t + ε
t
(2.2) Yt = C
t + Z
t
where Yt, C
t, and Z
t (Z ≡ Y - C) respectively denote income, consumption and nonconsumption
expenditure.
The reduced form representation corresponding to (2.1) and (2.2) is given by
(2.3) Ct = π
11 + π
12Z
t + η
t
(2.4) Yt = π
21 + π
22Z
t + η
t
where (2.5a-e) ηt = ε
t/(1-β)
π11
= α/(1-β)
π12
= β/(1-β)
π21
= α/(1-β)
π22
= 1/(1-β)
Note that π12
and π22
correspond to the multipliers discussed in simple macroeconomics
models. Haavelmo's analysis of the simple model defined by (2.1) and (2.2) pointed out many
problems which are also associated with larger econometric models. For this reason we will
consider this model in detail.
V 11
Estimation. Past experience might suggest that the OLS estimator of β would have
desirable statistical properties if εt in (2.1) is not characterized by autocorrelation or
heteroskedasticity. The OLS estimator of β in (2.1) is defined by
(2.6) ( )( )2
,(Y- Y)(C- C)ˆ = (Y- Y)
Cov Y C
Var Yβ
∑=
∑
but from (2.3) and (2.4), we see that
(2.7) β
εεπ
-1
- + )Z(Z- = CC- 12
β
εε
β
β
-1
- + )Z(Z-
-1 =
and
(2.8) β
εεπ
-1
- + )Z(Z- = YY- 22
β
εε
β -1
- + )Z(Z-
-1
1 = ;
hence, after substituting (2.7) and (2.8) into (2.6), we can write
(2.9)
β
εε
β∑
β
εε
β
β
β
εε
β∑
β
-1
)-( +
)-(1
)Z(Z-
-1
)-( + )Z(Z-
-1
-1
)-( +
) - (1
)Z(Z-
= ˆ2
( ) 22
2 2 2
2 2
2 2 2
1 (Z- Z)( - ) ( - )(Z- Z + + )
(1- (1- (1-) ) )ˆ =
(Z- Z ( - )(Z- Z) ( -) ) + 2 +
(1- (1- (1-) ) )
β ε εβ ε ε
β β ββ
ε ε ε ε
β β β
+ ∑
∑
( )
2 2
2 2
(Z- Z /N + 1 (Z- Z)( - ) /N + ( - /N) )=
(Z- Z /N + ( - )(Z- Z) /N + ( - /N) )
β β ε ε ε ε
ε ε ε ε
∑ + ∑ ∑
∑ ∑ ∑.
Assuming that:
σ→∑ 2Z
2N
1=t
/N)Z(Z- as N → ∞,
0 /N)-)(Z(Z- N
1=t
→εε∑ as N → ∞, and
σ→εε∑ 22N
1=t
/N)-( as N → ∞,
gives us:
V 12
(2.10) N → ∞, σσ
σσβ→β
22Z
22Z
+
+ ˆ .
σσ
βσβ
22Z
2
+
)-(1 + = .
. Hence, we see from (2.10) that β is an inconsistent estimator of β with asymptotic bias equal
to the second term in (2.10)
σσ
βσ22
Z
2
+
)-(1.
This may seem like a surprising result in light of the apparent simplicity of the consumption
function. It may not be obvious which of the assumptions
(A.1) εt distributed normally
(A.2) E(εt) = 0 for all t
(A.3) Var(εt) = σ2 for all t
(A.4) E(εtεs) = 0 for t ≠ s
(A.5) Yt and ε
t are independent
are violated. But upon closer inspection (hint: see (2.4)) we note that
ε
β
εππε )(
-1 + Z + E = )YE( t
tt2221tt
= E(ε 2
t )/(1-β)
= σ2/(1-β) ≠ 0;
hence, (A.5) is violated and OLS estimators of the structural parameters α and β are biased and
inconsistent. In fact, this is typically the case when OLS is used to estimate structural
relationships which include endogenous variables on the right hand side of the structural
equation. Right hand side endogenous variables are commonly referred to as endogenous
regressors.
As another example, the asymptotic bias of the OLS estimator of β12
in (1.1) is given by
V 13
(2.11) FC))(Y,Corr-(1 + +
) + (22
232
221
211222
γσσ
σββ
εε
ε .
How can we obtain consistent estimators of the unknown structural
parameters?
Two stage least squares or an appropriate application of instrumental variables estimation
provides a solution. It is instructive to consider an alternative estimator first. Recall that the
ordinary least squares estimators of the reduced form equations (referred to as least squares no
restrictions, LSNR) will yield unbiased and consistent estimators of the πij
's which will be
denoted by ˆij
π . This observation provides the basis for obtaining consistent estimators of α and
β in the Haavelmo model. From (2.5 c,e) we note that
β = π12
/π22
hence, a consistent estimator of β can be obtained from
(2.12) β* = π12
/ π22
where )Z(Z-
)Z)(Z-C(C- = ˆ 212
∑
∑π
)Z(Z-
)Z)(Z-Y(Y- = ˆ 222
∑
∑π
or
(2.13) )Z)(Z-Y(Y-
)Z)(Z-C(C- = *∑
∑β
In order to verify the consistency of β* in (2.13) we replace (C- C ) and (Y- Y ) in (2.12) by (2.7)
and (2.8) to obtain
V 14
(2.14)
[ ]
[ ]
β
εε
β∑
β
εε
β
β∑
β
ZZ- -1
)-( + )Z(Z-
-1
1
ZZ- -1
- + )Z(Z-
)-(1
= *
2
2
(Z- Z) /N + ( - )(Z- Z) /N=
(Z- Z /N + ( - )(Z- Z) /N)
β∑ ∑ ε ε
Σ ∑ ε ε
Now as N → ∞
β* → β;
hence, β* is a consistent estimator and is obtained by obtaining consistent estimators of the
reduced form (LSNR) and then deducing corresponding estimates of structural coefficients. This
general method is referred to as indirect least squares (ILS), but it is not applicable for all
structural models.
The consistent estimator β* can also be obtained by replacing the dependent variable on the
right hand side of (2.1) by its predicted value (from the reduced form)
Y = π21
+ π22
Z
or Y - Y = π22
(Z- Z )
and then applying least squares to the resultant expression. More explicitly,
V 15
(2.15 a-e) )Y-Y(
)C)(C-Y-Y( = *
2∑
∑β
)Z(Z-
)C)(C-Z(Z-
ˆ
ˆ =
2222
22
∑
∑
π
π
2
22
1 (Z- Z)(C- C)=
(Z- Z)ˆ
∑
∑π
∑
∑
∑
∑
)Z(Z-
)C)(C-Z(Z-
)Z)(Z-Y(Y-
)Z(Z- =
2
2
)Z)(Z-Y(Y-
)C)(C-Z(Z- =∑
∑
which corresponds to (2.13). Compare (2.15 a) with (2.6) and note that the only difference is that
Y (predicted value) replaces Y in (2.6). The structural estimator, obtained by applying least
squares to the structural equation which has been modified by replacing the right hand dependent
variables by their reduced form predictions is referred to as two stage least squares (2SLS).
2SLS yields consistent estimators, and is applicable even when indirect least squares is not.
Another way of looking at the alternative estimator is obtained by comparing (2.6) and (2.15e).
Here we see that the difference is that the right hand side dependent variable Y in (2.6) is
replaced by Z (an instrumental variable) which is correlated with Y, but not with C; hence, these
estimators are sometimes referred to as instrumental variables estimators.
A numerical example: the Haavelmo data set (Haavelmo.dat).
Using the data provided by Haavelmo, the regular OLS estimates of the consumption
function given by
OLSC = 84.01 + .732Y
s ( β ) (14.55) (.030)
R2 = .971
s2 = 58.21.
V 16
The corresponding 2SLS estimates of the consumption function are given by
2SLSC = 113.1 + .672Y
(17.8) (.037)
s2 = 71.29.
The LSNR estimates of the reduced form equations are given by
C = 344.70 + 2.048Z
(16.48) (.341)
R2 = .668
Y = 344.70 + 3.048Z
(16.48) (.341)
R2 = .668
The reader should verify that the indirect least squares estimators are equal to the 2SLS.
However, except for pedagogical examples, the reader will apply 2SLS or instrumental variables
estimation directly and not use the two step procedure. Also, the two step procedure yields
incorrect standard errors.
CONFIDENCE INTERVALS. In determining confidence intervals for structural
parameters, the reader might be inclined to use the results associated with the OLS or 2SLS
estimates of the structural equation under consideration. As an example of this we compute
"95% confidence intervals for β (the MPC)."
(a) Based upon OLS: (t = 2.101)
β OLS ± tsβ
= (.732 ± 2.101(.0299))
= (.669, .795)
V 17
(b) Based upon 2SLS
β 2SLS ± tsβ
= (.672 ± 2.101(.0368))
= (.594, .748)
These confidence intervals are very different and one might ask which if either is appropriate. As
it turns out, neither is completely satisfactory since
s
-ˆ
β
ββ
is not exactly distributed as a t-statistic where β is obtained from the technique of OLS or 2SLS.
One way in which we can determine which (if either) of the previous confidence intervals is
closest is to note that
ij
ij ij
ˆ
- ˆ ~ t(n- 2);
sπ
π π
hence,
22
22 22/ 2 / 2
ˆ
- ˆ1- = Pr[- ]t t
sα α
π
π πα ≤ ≤
22 2222 / 2 22 22 / 2ˆ ˆ= Pr[ - + ]ˆ ˆt s t sα απ ππ π π≤ ≤
22 2222 / 2 22 / 2ˆ ˆ
1= Pr[ - + ]ˆ ˆt s t s
1-α απ ππ π
β≤ ≤
22 2222 / 2 22 / 2ˆ ˆ
1 1= Pr[1 - 1 - ]
- +ˆ ˆt s s tα απ π
βπ π
≤ ≤ .
Making the appropriate substitutions we obtain
(.57, .73)
which is much closer to the results obtained using two least squares than from OLS. One might
be inclined to conjecture that a reason for the poor performance of OLS confidence intervals is
due to the asymptotic bias of OLS estimator,
σσ
βσ22
2
+
)-(1.
It might be instructive to estimate the asymptotic bias. Doing so we obtain for OLS estimates of
σ2(s2=58.2), β( β =.732), σ 2
z (285.55); hence asymptotic bias ( β OLS) = .0454; for 2SLS estimates
V 18
of σ2(s2=71.29), β( β =.672), σ 2
z (285.55), asymptotic bias ( β OLS) = .0655. Note that the
difference between the OLS and 2SLS is (.732 - .672 = .06).
PREDICTIONS. In order to make predictions, one should use the reduced form
representation.
V 19
K2 ≥ G∆ - 1
3. A BRIEF OVERVIEW
The mathematical formulation of an economic model is generally referred to as the
structural representation. The structural equations in the structural representation will often
include endogenous regressors (endogenous variables on the right hand side) as well as
exogenous variables.
The reduced form representation corresponding to the structural representation is
characterized by separate equations expressing each dependent variable as a function of the
exogenous variables. The reduced form provides explicit expressions for the equilibrium for the
model, conditional on an arbitrary, but given, set of values for the exogenous variables. The
reduced form coefficients can be interpreted as "multipliers" and yield comparative static results.
The reduced form representation is usually the form used for obtaining forecasts from
econometric models.
After the econometrician is satisfied that a given econometric model is consistent with
relevant economic theory, it is important that each structural equation be identified.
Identification should be checked even before attempting to estimate the model. A necessary
condition (order condition) for a structural equation to be identified is that the number of
exogenous (predetermined) variables excluded (K2
) from a structural equation is at least as large
as the number of endogenous regressors (one less than the number of endogenous variables in the
equation being checked (G∆)),
.
If K2
is thought of as referring to instrumental variables, then the necessary condition for
identification is that there must be at least as many instrumental variables as endogenous
regressors. This condition must be satisfied for each structural equation. The values for K2
and
V 20
ivregress 2sls y1 X1 (Y2 Y3=X1 X2)
G∆ may vary from one equation to another. Identities do not contain unknown parameters and
need not be checked for identification.
OLS estimates of parameters in structural models are typically biased and inconsistent
with unreliable t-statistics. This is due to the correlation between the error and endogenous
regressor on the right hand side of the equation. Two stage least squares estimators (2SLS)
provide biased, but consistent estimators. They can also be viewed as instrumental variables
estimators.
The Stata command for 2SLS is
where Y = endogenous variables (y1 on lhs, y2 and y3 on the rhs),
X1 = exogenous variables in structural equation being estimated,
X2=Z = exogenous variables in the model, but excluded from the equation being
estimated. The variables in X2 are often called instruments. An alternative form for the two
stage estimators is given by
Example 1: See the problem set for some sample data
Demand: Q = γ11
- β12
P + γ12
Y + ε1t
Supply: Q = γ21
+ β22
P - γ23
FC + ε2t
ENDOGENOUS VARIABLES: Q, P
EXOGENOUS VARIABLES: Y, FC
(a) Identification
(1) Demand K2
= 1 FC is in the supply model, but
not in the demand equation
G∆ - 1 = 2 - 1 = 1 One endogenous regressor (P) in the
demand equation
ivregress 2sls y1 X1(Y2 Y3=X2)
V 21
(2) Supply K2
= 1 Y is in the demand model, but
not in the supply equation
G∆ - 1 = 2 - 1 = 1 One endogenous regressor (P) in the supply equation
Therefore K2
≥ G∆ - 1 is satisfied for the supply and demand equation.
(b) Estimation of the structural parameters (Stata commands)
(1) Demand
ivregress 2sls Q Y (P = FC) or ivregress 2sls Q Y (P=Y FC)
(2) Supply
ivregress 2sls Q FC (P = Y) or ivregress 2sls Q FC (P=Y FC)
(c) Estimation of the reduced form (Stata commands)
(1) Q Equation
reg Q Y FC
(2) P Equation
reg P Y FC
Example 2. Consider the Haavelmo model and data:
Ct = α + βY
t + ε
t
Yt = C
t + Z
t
(a) Identification
The exogenous variable Z is not included in the consumption function, but it is in the
identity.
(b) Estimation of the structural parameters (STATA commands)
ivregress 2sls c (Y=Z)
(c) Estimation of the reduced form parmaters (STATA commands)
reg c z
V 22
reg y z
The data used by Haavelmo is given
Y C Z
433 394 39
483 423 60
479 437 42
486 434 52
494 447 47
498 447 51
511 466 45
534 474 60
478 439 39
440 399 41
372 350 22
381 364 17
419 392 27
449 416 33
511 463 48
520 469 51
477 444 33
517 471 46
548 494 54
629 529 100
References
Haavelmo, T. "Methods of Measuring the Marginal Propensity to Consume," Journal of
American Statistical Association, 42(1947):105-122.
Working, E. "What Do Statistical Demand Curves Show?," Quarterly Journal of Economics,
41(1926):212-235.
V 23
4. PROBLEM SET 6: Simultaneous Equations
Consider the following Supply and Demand Model:
Demand: Qt = (
11 + ∃
12 P
t + (
12 Y
t + e
t1
Supply: Qt = (
21 + ∃
22 P
t + (
23 FC
t + e
t2
where Qt, P
t, Y
t and FC
t denote quantity, price, income and factor costs.
Observations on these variables are given by:
Pt 185 215 275 279 310 330 400 360 450 515
Qt 320 360 460 460 480 540 600 570 680 780
Yt 100 120 160 164 180 200 240 220 280 320
FCt 10 12 14 15 20 16 24 20 28 30
1. Express the reduced form representation in terms of the structural coefficients.
2. Determine which of the structural coefficients can be expressed in terms of the reduced
form coefficients and make this relationship explicit where possible.
3. Determine whether the supply and demand equations are identified. Check the order
(necessary) condition in your analysis.
4. Estimate the reduced form equations for P and Q using the technique of Least Squares
(LSNR). (Hint: In Stata, type reg q Y FC and reg p Y FC)
a) Test for the presence of autocorrelation.
b) Test for heteroskedasticity using the results from the “whitetst” or “hettest” commands
in STATA .
V 24
5. Estimate the supply and demand equations using OLS.
6. Estimate the supply and demand equations using 2SLS (“ivregress” in Stata).
7. Comment on the properties of the estimators associated with questions (5) and (6).
8. Indicate how you could test the following hypotheses and discuss any related problems.
a) ∃12
= -2
b) (12
= 0
c) Β12
= 2.5
d) Β12
= 0
9. What implication does Β22
= 0, the coefficient of FC in reduced from equation for P, have
with respect identification of any of the structural equations?