Download - Notes on Econometrics

I 1

James B. McDonald

Brigham Young University 5/2010

I. Introduction to Econometrics

Objective: Make this one of the most interesting and useful courses you take in your

undergraduate program.

Outline: A. Models and Basic Concepts, B. Data, C. Econometric Projects, D.

Problem set

Econometrics deals with the problem of estimating relationships between variables. These

techniques are widely used in the public and private sectors as well as in academic settings. They

help provide an understanding about relationships between variables which can also be useful in

policy analysis and in quantifying expectations about future events.

Some applications of econometric procedures include:

• Economics and Business o Estimation of demand relationships

impact of advertising on demand pricing decisions determinants of market share estimation of income elasticities

o Estimation of cost relationships o International trade and the balance of payments o Macro models o Rational expectations o Predicting corporate bankruptcy or individual default on loans o Identifying takeover targets

• Education o Production functions

Tradeoffs between different education techniques o Estimation of supply and demand for teachers o Predicting acceptance into graduate and professional programs o Estimating the impact of different types of schools on graduate’s salaries

• Political Science o Analysis of voting behavior

• Public Sector o Forecasting tax receipts o Public Sector production functions

• Legal Profession o Models of jury selection o Discrimination

2I

In each application there is the question of (1) MODEL FORMULATION (functional form,

variable classification as well as the theoretical foundation), (2) ESTIMATION of unknown

parameters, (3) TESTING hypotheses, and (4) PREDICTION.

A. Models and Basic Concepts

1. The formulation of the model is generally based upon economic considerations.

Example 1. Consumer Demand Theory

Maximize U(X1, X2)

Subject to P1X1 + P2X2 = Y

where Y denotes income and the Pi and Xi, respectively, denote the price and quantity of the

ith good.

The solution of this problem yields demand equations for X1 and X2

Xi = Di(P1, P2, Y) i = 1, 2

where the functional form is unknown unless the utility function U( ) is specified. If

advertising (A) effects preferences (U(X1, X2, A)), then demand will also depend upon

advertising expenditure, Xi = Di (P1, P2, Y, A). Statistical data for Xi, Pi and Y and

econometric procedures are then used to estimate the demand equations and any unknown

parameters.

Example 2. A Simple Macro Model

Ct = β1 + β2 (Yt - Tt)

Yt = Ct + It + Gt + Xt

where Ct, Yt, It, Gt, Tt, and Xt respectively denote consumption, total production, investment,

government expenditure, taxes and net exports. β1 and β2 are unknown parameters.

It is important to remember that models are not complete descriptions of a situation, but

rather attempt to summarize the main relationships between the variables.

3I

a. Classification of Variables

(1) Endogenous variables (dependent)--those variables determined by the model, e.g.,

X1 and X2 in example 1 and Yt and Ct in example 2.

(2) Exogenous variables (independent)--those variables not determined by the model,

but which are assumed to be given. P1, P2 and Y would be exogenous in example

1. It, Gt, Tt and Xt would be the exogenous variables in model 2.

(3) Predetermined variables--

(a) lagged endogenous variables--endogenous variables from a previous time

period;

(b) exogenous variables as defined above.

b. Representation of Models

(1) Structural representation--a mathematical representation of a hypothesized

model (based on economic theory) which determines the value of endogenous

variables collectively explained by the model. The structural equations may

include more than one endogenous or dependent variable per equation.

Examples:

(a) A simple macro model

Ct = β1 + β2 (Yt - Tt) + εt

Yt = Ct + It + Gt + Xt

Dependent variables: C,Y

Independent variables: T, I, G, X

Unknown parameters: β1, β2

(b) Demand: Qt = β1 + β2Pt + γ1Yt + ε1t

Supply: Qt = β3 + β4Pt + γ2wt + ε2t

Dependent variables: Q, P

4I

Independent variables: Y, W

Unknown parameters: β1, β2, β3, β4, γ1, γ2

The ε's in these equations represent the "errors" not explained in the model. The

errors can represent the impact of other explanatory factors or measurement errors.

In each case we will want to use data to estimate the unknown parameters.

(2) Reduced form representation--expresses the current level of each of the

endogenous variables as a function of predetermined variables (exogenous and/or

lagged dependent).

Examples: The reduced form representation corresponding to the two previous

structural models can be shown to be as follows:

(a) β

ε

ββ

β

2

ttt2tt

22

1t

-1 + )X + TB - G + I(

-1

1 +

-1 = Y

β

εβ

β

β

β

β

2

ttt2tt

2

2

2

1t

-1 + )X + T - G + I(

-1 +

-1 = C

(b) ββ

εε

ββ

γ

ββ

γ

ββ

ββ

42

t2t2t

42

1t

42

2

42

13t

-

- + Y

- + w

- +

-

- = P

ββ

εβεβ

ββ

γβ

ββ

γβ

ββ

ββ

42

t24t12t

42

14t

42

22

42

42t

-

- + Y

- - w

- +

-

- = Q

Economics 388 will introduce the analysis of structural economic models, but will primarily

focus on models written in the reduced form representation, i.e., with the dependent variable

on the left and predetermined variables on the right hand side. However, there are some

very important problems with endogenous variables (endogenous regressors) on the right

hand side of the equation.

5I

2. Estimation of Unknown Parameters

The coefficients of the variables in the reduced form and structural representations are

referred to as parameters and are generally unknown. The notation β will be used to denote

the estimator of the unknown population parameter β. In order to obtain any quantitative (as

opposed to qualitative) estimates of the impact of changes in exogenous variables upon the

dependent variables, the unknown parameters must be estimated. As an example of this we

note that based upon the macro model just considered

. - 1

1 =

G

Y

2t

t

β∂

∂

Recalling thatY

C =

t

t

2∂

∂β (marginal propensity to consume) is generally assumed to be

between zero and one, we can deduce that in this model an increase in government

expenditure will result in an increase in the equilibrium level of income. However, in order

to estimate the magnitude of the increase in Yt associated with the increase in Gt, β2 must be

estimated. Sometimes it may be easier to estimate the reduced form coefficient

β 2-1

1directly.

3. Tests of Hypotheses

Many times we are faced with the problem of determining whether a particular variable

is an important explanatory factor: does wealth or advertising have a significant impact on

consumption; what is the direction of influence of a change in a variable; or how can we test

hypotheses about the magnitude of an elasticity under consideration. All of these problems

involve hypothesis testing and require a knowledge of the density of the estimator under

consideration or of a related test statistic.

6I

For example, assume that the density of β 2, f( β 2), under the null hypothesis Ho: β2 = 0

appears as follows:

Assume that 2β denotes the estimated value of β2. If β 2 is far out in the tail, which is

unlikely under the null hypotheses, we will agree to reject the null hypothesis that β2 = 0.

4. Prediction

A frequent application of econometrics is to obtain predictions for the dependent

variables corresponding to a certain value for the independent variable(s) [X]. In order to

obtain a prediction for the dependent variable (Y) in some future period, we need to obtain a

prediction for the independent variables (X) (say X*) in that period and also assume that the

relationship between X and Y observed in the sample period continues to be valid in the

future. Substituting in the predicted value of X (X*) into the estimated relationship yields

the estimated value of Y (Y*= β 1 + β 2X*). We know that Y* will probably not be exactly

correct and so we will also discuss methods of obtaining confidence intervals for the actual

value of Y.

β2 = 0

f( β 2)

7I

The first exercise set attempts to clarify the notion of reduced form and structural

representations of economic models. The importance of the structural parameters is also

illustrated in these exercises. We now turn to some important issues related to the data used

in estimating economic models.

B. Data

Applied econometrics involves the four steps just discussed: (1) model formulation and

interpretation of variables, (2) estimation of unknown parameters, (3) hypothesis testing, and

(4) prediction. The process summarized in these four steps is an integral part of empirical

research in the physical and social sciences. However, the results of this research may be

sensitive to the formulation of the model AND the data used. Frequently the desired data are

not available or are not in the desired form. Some data types and issues involve:

quantity and price indices: Paasche, Laspeyers

real or nominal values

total or per capita levels

stocks vs. flows

deseasonalized vs. seasonalized

An important question is whether the data we are using measure what we really want [story:

museum]. A useful reference to the importance of data and data limitations is O.

Morgenstern, On the Accuracy of Economic Observations.

Estimated relationship between x and y

Y*= β 1 + β 2X*

confidence intervals

8I

1. Data Characteristics:

a. Quantitative--Qualitative

Quantitative variables measure "quantities" such as

price, sales volume, weight or income.

Qualitative variables are used to model "either/or" situations and might be used to

model membership in one of several groups such as:

⋅homeowner or non-homeowner

⋅employed/unemployed

⋅male/female

⋅accurate or inaccurate income tax returns

Dependent and independent variables can be quantitative or qualitative variables.

Example: Consider a possible relationship between salary, years of employment

and gender. This model might be formulated as:

Salary = β1 + β2 years employed + β3 Gender

where we will discuss ways in which “Gender” can be included in the econometric

model in another section dealing with binary or qualitative variables.

b. Time Series, Cross Sectional, Pooled Data

Time Series Data--measures a particular variable over successive time periods (annual,

quarterly, monthly, weekly; e.g., income, consumer price index (CPI)).

Cross Sectional Data--measures a particular variable at a given point in time for

different entities. An example of cross sectional data would be the wholesale price of

unleaded gas at 2:30 p.m. on January 2, 2009 across different gas stations.

9I

Pooled or Merged Cross Sectional/Time Series Data

Per Capita Income, by State and Year

States Year

1980 1985 1990 1995 2000 2005

Alabama T

his

co

lum

n a

lon

e

wo

uld

be

cro

ss-

sect

ion

al.

Alaska This row alone would be time-series.

…

Utah

...

Panel Data--pooled cross sectional data in which the same cross section is sampled over

time. A well-known panel data set is the National Longitudinal Study. This study

surveys family expenditures of approximately 20,000 people.

c. Non-experimental--Experimental Data

Non-experimental data-typical in the social sciences.

Observations drawn from a system not subject to experimental control.

Experimental (common in natural sciences, but experimental data are becoming

more commonly used in economics)

examples: Physics/chemistry

Negative income tax (different tax rates, direct subsidies)

Health insurance

Influence of housing allowance

Split cable--different commercials

2. Data problems

a. Degrees of freedom

Not enough observations to estimate model (the number of observations must be greater

than the number of parameters)

10I

b. Multicollinearity--multicollinearity refers to the tendency of economic variables to

move together making it difficult to accurately estimate the impact of changes in

individual variables. This is often encountered in non-experimental data available in

the social sciences.

c. Measurement error and accuracy.

o Changing definitions of variables--government statistics: money, automobiles

(include station wagons?)

o Measurement Error--error boxes

o More accuracy reported than justified--[Story: Weigh hogs in Texas]

o Combining data with different accuracies—[Story: Age of river]

o Accuracy isn't necessarily symmetric--hence the errors need not "cancel" out

income tax reports—individual and corporate profits

women's age in surveys-- not many report ages between forty and forty five

3. Some data sources

Excellent websites include

http://www.ciser.cornell.edu/ASPs/datasource.asp and

http://www.econdata.net/.

Both of these websites provide access to a wide variety of data sources. Included in the

description of econdata.net is a list of the ten best sites based on user feedback. Some are

copied below for your convenience:

• Bureau of the Census

The Census Bureau site will lead you to the full range of popular and obscure Census

data series. The site has a comprehensive A-to-Z listing of data subjects, as well as

**American FactFinder** and CenStats, query-based means for accessing data for

your area from a variety of Census series.

• Bureau of Labor Statistics

Bureau of Labor Statistics (BLS) has a wealth of information available through its

Web site. BLS jobs, wages, unemployment, occupation, and prices data series are

available through a much improved query-based system. Also see Economy at a

Glance for an integrated set of BLS data for states and metro areas.

• Bureau of Economic Analysis

The Bureau of Economic Analysis (BEA) makes its Gross State Product, Regional

Economic Information System (REIS), and foreign direct investment data available

11I

on its Web site. You can also use this site to access BEA's national income account

data and its publication of record, the Survey of Current Business.

• http://www.econdata.net/

This website includes links to many different types of data, including some of the

following sites.

• http://www.Census.Gov

This site includes all data for the Census of Population and Housing and U.S. and

World Population data.

• http://www.census.gov. United Nations Statistical Division

• http://www.stls.frb.org [St. Louis Federal Reserve Economic Data Base]

Price indices, interest rates, balance of payments, employment, and monetary data.

• [Resources for Economists on the Internet]

U.S. macro and regional data, other U.S. data, international data, financial data, and

academic journal archive data.

• http://rfe.org (Resources for Economists)

• http://www.bea.doc.gov

The Bureau of Economic Analysis provides time-series data on a

variety of U.S. macroeconomic variables.

• http://www.psidonline.org

The Panel Study of Income Dynamics (PSID) is a nationally representative

longitudinal study of families and individuals begun in 1968. The initial focus

was to examine employment, earnings, and income over the life cycle for 5000

families. Interviews for many of these families and their descendents has

continued.

• http://www.icpsr.umich.edu

• http://www.icpsr.umich.edu/icpsrweb/ICPSR/

The Interuniversity Consortium for Political and Social Research (ICPSR)

provides access to an extensive collection of downloadable data. Try it, you may

like it.

• http://www.ipums.umn.edu

Integrated Public Use Microdata Series. Registration is free and registered users

can select “Create Extract” to choose variables to include in their data set.

• International—is an integrated series of census microdata samples from 1960 to

the present. At this time, the series includes eighty samples drawn from twenty-six

countries, with more scheduled for release in the future.

• USA- is an integrated series of representative samples drawn from the U.S.

censuses of the period from 1850 to 2000. IPUMS-USA also includes American

Community Survey (ACS) data from 2000 to 2005.

• CPS- provides integrated data and documentation from the March Current

Population Survey (CPS) from 1962 to 2006. The harmonized CPS data is also

compatible with the data from IPUMS-USA

Some other internet resources

• National Bureua of Economic Research

o http://www.nber.org/data/

12I

• Another excellent data site which has data to explore the impact of religious

practices on the family is

http://www.people.cornell.edu/pages/jpp34/religion_datasets.htm

• For those interested in sports data, try espn.com, pgatour.com, nba.com, basketball-

reference.com, hoopdata.com

• For those considering purchasing a diamond, you might try www.diamonds.net

•

•

DataFerrett is a popular data mining tool that accesses data stored in TheDataWeb through

the internet. DataFerrett can be installed as an application on your desktop or use a java applet

with an internet browser. DataFerrett is compatible with Windows operating systems.

http://dataferrett.census.gov/

• National Center for Health Statistics

• National Retirement Survey

Google is also an excellent resource to assist in locating data and studies related to your area

of interest.

C. Econometric Projects

The purpose of the project is to provide an opportunity to formulate a model of interest,

collect relevant data, estimate the model and interpret the results. This experience will

facilitate an integration of the statistical and econometric methodologies discussed in class

with other economics courses which may focus more on institutional descriptions of events

and organizations or an analysis of theoretical models. These models are merely

hypothesized explanations of observed economic data and should be estimated and tested.

Econometrics provides a method of testing the validity of the hypotheses underlying

economic models.

1. Model Selection and Data

The selection of a model and data to be used are the first steps in an econometric

project. Other economics courses or related journal articles may provide a source of

interesting models. The determination of an econometric project should be based on both an

interesting model and available data. A common problem encountered with econometric

projects is the unavailability of relevant data. Some helpful data sources are contained in the

section I.B.3 of the notes. A growing number of journals provide data used in published

articles. Replicating and updating the research in a published paper can be a productive

exercise. Alternatively, you might consider selecting a project related to your future career

aspirations, a unique data source to which you have special connections, or a passion you

have long held. A pre-med student used epidemiology data he was already working on with

13I

a professor from the Microbiology Department. A pre-law student studied the determinants

to law school rankings. A BYU basketball player studied the impact of various statistics on

total BYU points scored. A student working for a direct-sales company used Census data to

predict what counties would be most successful for his company. Another student had a job

in the energy industry and built a model predicting natural gas prices. One approach is to

think about topics that would be good talking points in future job interviews. Previous

topics have truly been very diverse in terms of both topic and scope. Some more examples:

• Determination of factors related to admission to medical school (one student wrote

the admissions committee and requested anonymous data, one student’s father was

the president of a college)

• The relationship between the value of diamonds and cut, color, and clarity (one

student found an online database of diamond prices and characteristics)

• Factors best determining the probability of divorce (one student used IPUMS.org,

one student obtained the data from a BYU MFHD professor he had)

• Interplay between state hunting licenses and state deer population (student requested

data from Minnesota State Hunting Department)

• Financial applications such as estimating betas of stocks (students have used

Marriott School resources, such as Bloomberg and Compustat)

• Production functions

• Phillips Curve (students have used publicly available unemployment and inflation

data)

• Prediction of consumer default on loans

• Estimating the likelihood of medical doctors to commit suicide (student used

DataFerret to access National Center for Health Statistics microdata)

• Impact of foreign aid on national stability and economic development (one student

had done research with a Political Science professor that provided him with the

development data, one student’s sister was working for an international aid NGO)

• Determinants of profit in used car sales (student used his roommate’s dad’s

dealership’s proprietary data)

• Relationship between consumer debt, credit ratings, and demographics (student used

American FactFinder for demographic data and used credit ratings from the small

business he worked for)

14I

• Impact of weather, daylight savings time, advertising and local events on retail sales

(one student requested sales data from his boss at a local store, another asked his

brother for sales and advertising data from his startup restaurant)

Once a topic has been selected you should review the previous literature on the topic. A

computer literature search will be helpful. Google Scholar is a useful starting point. Once

you find some good papers that deal with your topic, it is often useful to follow their

citations to identify other relevant literature. In specifying your model, you should clearly

identify the endogenous (dependent) variables to be explained as well as the exogenous

(independent) variables in your model. If you are replicating a previously published

empirical study, it would also be interesting to update the analysis. For economics 388 you

may want to restrict the model to explain one or two endogenous variables. For economics

588, four endogenous variables is a reasonable upper limit with at least six or eight

exogenous variables. If you are working with a simultaneous equations model, both the

structure and reduced form parameters should be estimated.

2. Model Estimation

For single equation models or reduced form representations, ordinary least squares can

be used if neither autocorrelation nor heteroskedasticity is present. Multicollinearity makes

it difficult to obtain accurate estimates of the effects of individual variables. Improved

estimation procedures are available if either autocorrelation or heteroskedasticity is present.

Simultaneous structural equation models are better treated with estimation techniques

specifically developed for these models. The most widely used of these techniques is

probably two stage least squares or instrumental variables estimation. Alternative methods

are also available for structural models and will be discussed in economics 588.

Ordinary least squares, two stage least squares, instrumental variables, and many other

estimators are available in such computer packages as SAS, Stata, SHAZAM, SPSS,

EVIEWS, RATS, TSP, Matlab, Gretl,and R, to mention only a few. Gretl and R are free.

15I

3. Organization of the write-up

The format for your paper should be modeled after that required by scholarly refereed

journals and would include:

(a) Title page

(b) Abstract. This should be less than one page in length and summarize the topic,

methodology and findings.

(c) Introduction. This section should state the nature and objectives of the project along

with a review of the relevant literature.

(d) Description of the model. The model should be defined and each equation carefully

explained. The variables should be clearly defined. The expected impact of each

exogenous variable on the dependent variable and the reasons explained, i.e., discuss

the comparative statics of the model.

(e) Interpretation of the variables and estimated model. The interpretation of the variables

and data references should be included in the paper. Also include a copy of the data or

references to the data. Basic statistical descriptions for the variables, such as the mean,

variance, minimum, and maximum should be summarized in a table. The results of

estimating the model should be reported and discussed in this section and would

include: parameter estimates, standard errors, t-statistics, F-statistics, R2, tests for

normality, autocorrelation, heteroskedasticity and possibly the degree of

multicollinearity.

(f) Economic analysis of the estimated model and implications. This section would include

a comparison of the estimated results with the comparative static implications of the

economic model. Policy implications, if any, and the predictive capability of the model

could also be included in this section.

(g) Summary and conclusions. Review the major findings as well as possible future work.

(h) Bibliography. Include complete citations for all references in the paper including data

sources.

(i) Include copies of your data in an appendix or give a complete citation to the data

sources. This facilitates a replication of your work which is an important component of

scientific research.

16I

D. Problem set

Intro Problem Set

Introduction and Stata

Theory

1. Consider the labor model

Demand: w = 100 - 5N

Supply: w = 50 + 5N

where w denotes the wage rate and N denotes the number of individuals.

a. Graph these schedules and solve for the equilibrium wage and employment level.

b. Graphically depict the effect of imposing a minimum wage of w = 80. What is the

associated level of unemployment?

(JM)

2. Now consider the demand and supply schedules:

Demand: w = β1 - β2N

Supply: w = γ1 + γ2N

a. Demonstrate that the equilibrium wage rate ( w ) is given by

βγ

γββγ

22

1212

+

+ = w

b. Demonstrate that the level of unemployment associated with the imposition of a minimum

wage rate of w + 10 is given by

.1

+ 1

1022

βγ

(Hint: What is the level of unemployment at w ?)

c. What is the importance of knowing the values of the structural parameters for policy

implications?

(JM)

3. Assume the demand for gasoline is given by Qd = β1 - β2Pg and the supply of gasoline is

given by Qs = 100 + 10Pg - 2Pc where Q, Pg, and Pc denote the quantity gasoline, the price of

gasoline and the price of crude oil.

a. Obtain an expression for the equilibrium price of gasoline ( gP ) in terms of β1, β2, and

Pc.

17I

b. Evaluate the effect that an increase in Pc of 10 units will have upon the equilibrium

price of gasoline. Do the values of β1 and β2 have any effect on the magnitude of the

effect?

(JM)

4. Application in Stata

There are two ways to execute commands in Stata: writing a simple program file of commands

(called do-files) or entering in each command one at a time into Stata’s command line prompt.

We will use the latter method here, but you are encouraged to learn how to use do-files. They

are especially useful when you want to be able to replicate results several times, such as for

your projects.

First we enter in the data. Open up Stata, type in “edit” and hit enter.

Stata’s Data Editor should appear. Starting with the top left cell, enter in the data below, in

two columns:

This represents students’ GPAs along with the corresponding level of

parental income in thousands of dollars. The first student, for example, has a

3.9 GPA and comes from a family having an annual income of $ 75,000.

Close the data editor by clicking on the X in the top right corner. Stata has

saved your data and automatically named the two columns “var1” and “var2”

respectively. You can see them in the Variables window in the top left. Let’s

make sure that the data is as we want it.

Type “list” and hit enter. You should see a little table listing the data you have just entered.

Since “var1” and “var2” are vague variable names, let’s rename them.

Type in “rename var1 gpa” and hit enter. Then type in “rename var2 income.” Now when

you type in “list” you will see new variable names.

To see summary statistics for the two variables, use the summarize command: “summarize gpa

income.” (You can also just type “summarize” and Stata will summarize all of the variables

in memory.)

To see a scatter plot of the two variables with gpa on the y-axis and income on the x-axis, use

the plot command: “plot gpa income” (In Stata the dependent variable always goes first in a

list).

To run a simple linear regression showing the estimated effect of parental income on GPA,

use the regress command: “regress gpa income.”

To generate a new variable equal to the square of income, use the generate command:

“generate incomesq = income^2”. Use the list command again to look at a table of all three

variables.

Print the Stata output to turn in with this assignment (either using File… Print, or by copying

the output to a text editor like Notepad).

3.9 75

4.0 63

3.0 45

3.5 45

2.0 27

3.0 36

3.5 54

2.5 18

2.5 24

18I

*For most Stata commands, you don’t have to type out the entire command word. For

example, for generate instead of typing out “generate” you can use “g” “ge” or “gen”.

*You may have Stata keep a log of your results for you using the log command. At the

beginning of your Stata session, type “log using mynewlog” where “mynewlog” is the name of

your log file. Stata will open a new log in the “working directory.” To find out where the

working directory is, use the call directory command by simply typing in “cd” and hitting

enter. When you are done using the log and before exiting the program, close the log by

typing in “log close.”

5. Select a data website such as http://www.oswego.edu/~kane/econometrics/data.htm, select

two variables, calculate the means and variances, and plot the observations on the two

variables.

II 1

James B. McDonald Brigham Young University

5/2010 II. TWO VARIABLE LINEAR REGRESSION MODEL

Several applications about the importance of having information about the relationship

between economic variables were illustrated in the introduction. This section provides some essential building blocks used in estimating and analyzing "appropriate" functional relationships between two variables. We first consider estimation problems associated with linear relationships. The properties and distribution of the least squares estimators are considered. Diagnostic and test statistics which are important in evaluating the adequacy of the specified model are then discussed. A methodology for forecasting and the determination of confidence intervals associated with the linear model is presented. Finally, some alternative functional forms (nonlinear) which can be estimated using techniques of regular least squares are presented. A. INTRODUCTION

Consider the model

Yt = β1 + β2Xt + εt

with n observations (X1,Y1), . . ., (Xn,Yn) which are graphically depicted as

ε t: true random disturbance or

error term

(vertical distance from the observation to the line)

• Random behavior

• Measurement error (Y)

• Omitted variables

β1 + β2Xt: population regression line

• β1 and β2 are unknown

II 2

Population Regression Function:

The observations don't have to lie on the population regression line, but it is usually

assumed that

E(Yt | Xt) = β1 + β2Xt, i.e.,

the expected value or the "average" value of Y corresponding to any given value of X lies on the population regression line.

An important objective of econometrics is to estimate the unknown parameters (β1, β2),

and thereby estimate the unknown population regression line. This estimated regression line is referred to as the sample regression line. Again, the sample regression line is an estimator of the population regression line.

Sample Regression Function:

et (the residual) is the vertical distance from the Yt to the sample regression line, so

t t 1 2 t t tˆ ˆ ˆe Y X Y Y= −β −β = − , whereas t t 1 2 tY Xε = −β −β

It is important to recognize that the residual (et) is an estimate of the equation error or

random disturbance (εt) and may have different properties.

1 2

observed estimated randomY disturbance orregression

"residual"line

estimated Yfor a given X

ˆ ˆ

ˆ

t t t

t t

Y X e

Y e

β β= + +

= +

14243sample

1 2

observed error orpopulationY randomregression

disturbanceline

t t tY Xβ β ε= + +14243

II 3

B. THE ESTIMATION PROBLEM

(1) Given a sample of (Xt,Yt): (X1,Y1), . . ., (Xn,Yn),

Yt

. . . . . . _____________________________ Xt

(2) estimate β1, β2 , ( )1 2ˆ ˆ,β β .

Note that each different guess of β1 and β2, i.e., 1β and 2β , gives a different sample

regression line. How should 1β and 2β be selected? There are many possible approaches

to this problem. We now review five possible alternatives and then carefully develop a method known as least squares.

Criteria: (five of many)

(1) minimize "vertical" distances

min Σ et no unique solution

1β and 2β

min Σ e 2

t least squares or ordinary least squares (OLS)

1β and 2β

(2) min Σ et p robust estimators

1β and 2β

p=2 gives least squares p=1 gives least absolute deviations (LAD)

(3) min Σ (horizontal distances)2

1β and 2β

(4) min Σt (perpendicular distances from regression line)2

1β and 2β

II 4

(5) Method of moments (MM) estimators Sample average of estimated residuals = E(εt) = 0

0 = e t

n

1=t

∑

Sample covariance between residual and X = E(εtXt) = 0

0 = Xe tt∑

The solution of these equations yields OLS estimators

Many techniques are available and each may have different properties. We will want to use the best estimators. One of the most popular procedures is least squares.

Derivation of Least Squares Estimators (OLS)*

The sum of squares of the vertical distances between Yt and the sample regression line is called, by many authors, the sum of squared errors and is denoted SSE. The SSE can be written as

( )2

2

t t 1 2 tˆ ˆSSE = e = Y -β -β X∑ ∑

Different β 's (sample regression lines) are associated with different SSE. This can be

visualized as in the next figure. Least squares amounts to selecting the estimators with the smallest SSE.

____________ *Since the SSE involves squaring the residuals, least squares estimators may be very sensitive to "outlying" observations. This will be discussed in more detail later.

II 5

Minimizing SSE with respect to β 1 and β 2 yields

Proof: In order to minimize the SSE with respect to β 1 and β 2, we differentiate SSE,

with respect to β 1 and β 2, yielding:

(-1))Xˆ - ˆ - Y(2 = ˆ

SSE (1)t21t

t1

βββ∂

∂∑

e 2- = t

t

∑

β1

β2

SSE

( )

( )( )

( )( )( )

1 2

t tt2 2 2

tt

t t2

t

ˆ ˆY - X (the sample regression line goes through X,Y )

X Y nXYˆ

X nX

X X Y Y

X X

Cov(X,Y) Var(X)

β = β

−β =

−

− −=

−

=

∑∑

∑∑

II 6

)X(-)Xˆ - ˆ - Y(2 = ˆ

SSEt

(1)t21t

t2

βββ∂

∂∑

)Xˆ - X ˆ - X Y(2- = 2t2t1tt ββ∑

.Xe 2- = tt∑

We see that setting these derivatives equal to zero,1 2

SSE SSE = 0 and = 0

ˆ ˆβ β

∂ ∂

∂ ∂, implies

These two equations are often referred to as the normal equations. Note that the normal equations imply that the sample mean of the residuals is equal to zero and that the sample covariance between the residuals and X is zero which were also the conditions used in method of moments estimation.

Solving the first normal equation for β 1 yields

which implies that the regression line goes through the point ( X,Y ). The slope of the

sample regression line is obtained by substituting 1 2ˆ ˆY Xβ = − β into the second normal

equation tt

2

SSE = 0 or = 0e X

β

∂∑ ∂

and solving for β 2. This yields

1 2ˆ ˆY Xβ = − β

t t

t2 22t

t

( Y X nXY)ˆ

( X nX )

Cov(X,Y) Var(X)

−β =

−

=

∑∑

n

t

t=1

n

t t

t=1

e = 0

e X = 0.

∑

∑

II 7

C. PROPERTIES OF LEAST SQUARES ESTIMATORS

The properties of the β 1 and β 2 derived in the previous section will be very sensitive to

which of the following five assumptions are satisfied:

(A.1) εt are normally distributed

(A.2) E(εt Xt) = 0

(A.3) Homoskedasticity:

Var(εtXt) = 2 2

tσ = σ for every t

Homoskedasticity Heteroskedasticity

(A.4) No Autocorrelation:

Cov(εt, εs) = 0 t ≠ s

II 8

(A.5) The X's are nonstochastic (fixed in repeated sampling) and

Var(X) is finite, or in other words: 2

1

0 lim ( )n

tn

t

X X→∞

=

< − < ∞∑ .

(This assumption can be relaxed, but the X’s need to be uncorrelated with

the errors in order for OLS estimators to be unbiased and consistent.)

A linear model satisfying (A.2)-(A.5) is referred to as the classical linear regression model. If (A.1)-(A.5) are satisfied, then we have the classical normal linear regression model. We will now summarize the properties of the least squares estimators in each of these two cases.

1. The Classical Linear Regression Model (A.2 – A.5)

If Yt = β1 + β2Xt + εt

where (A.2)-(A.5) are satisfied, then the iβ ’sare

⋅unbiased: ( )ˆi i

E β β=

⋅consistent: Var( β i) → 0 as n → ∞

⋅the minimum variance of all linear unbiased estimators.

⋅These estimators are referred to as BLUE--best linear unbiased estimators.

⋅ (A.2)-(A.5) are known as the Gauss-Markov Assumptions.

2. The Classical Normal Linear Regression Model (A.1 – A.5)

If Yt = β1 + β2Xt + εt

where (A.1)-(A.5) are satisfied, then the least squares estimators are:

⋅unbiased

⋅consistent

⋅minimum variance of all unbiased estimators (not just linear estimators)

⋅normally distributed This result facilitates t and F tests which will be discussed in another section.

⋅least squares estimators will also be maximum likelihood estimators.

Since these desirable properties are conditional on the assumptions, it is important to test for their validity. These tests will be outlined in another section of the notes.

We now attempt to give some intuitive motivation to the concept of maximum likelihood estimation, then we prove that least squares are maximum likelihood estimators if (A.1)-(A.5) are valid.

II 9

a. Pedagogical examples of maximum likelihood estimation: (1) Estimation of µ (population mean)

The observed values of a normally distributed random variable Yt are denoted by (Yt's) on the horizontal axis. Assume that we know that these data were generated by one of two populations (#1, #2). Is it possible that the data were generated from #1?, from #2? Which is the "most likely" population to have generated the sample?

(2) Regression models

In this example, which of the two population regression lines is most likely* to have generated the random sample?

II 10

*It might be useful to think about these “pdf’s” as “coming out” of the page in a third dimension with the “points” being thought of as being normally distributed around the population regression line.

b. Maximum likelihood estimation--Derivation

How can we quantify the ideas illustrated by these two examples and obtain the "most likely" sample regression line? We now formally derive the maximum likelihood estimators of β1 and β2 under the assumptions (A.1)-(A.5).

For the model


(1) E(Yt) = β1 + β2Xt

(2) Var(YtX) = Var(β1 + β2Xt + εtXt) = σ2;

hence, we can write Yt ~ N[β1 + β2Xt; σ2] which means that the density of Yt, given

Xt, is given by f(YtXt) = . 2

e =

2

2/)X--Y-( 22t21t

σπ

σββ

These results can be visually depicted as in

the following figure:

II 11

The Likelihood Function for a random sample is defined by the product of the density functions. Since each density function gives the likelihood or relative frequency of an individual observation being realized, when we multiply these values, we obtain the likelihood of observing the entire sample, given the current parameters:

L(Y;β1,β2,σ2) = ( ) ( )1 nf Y f YL

=)()(2

e2n/22n/

2/)X--Y(- 22t21t

σπ

σββ∑

and the Log Likelihood Function is given by:

l (Y;β1,β2,σ

2) = ln L(Y;β1,β2,σ2)

= Σt ln f(Yt)

.ln 2

n - )ln(2

2

n - 2/)X--Y( - = 222

t21t

t

σΠσββ∑

( )2 2n n= -SSE/ 2 ln(2 ) - ln

2 2− πσ σ

Maximum Likelihood Estimators (MLE) are obtained by maximizing l (Y; β1, β2, σ

2)

over β1, β2, and σ2. This maximization requires that we solve the following equations:

0 = SSE

2

1- = (1)

12

1 β∂

∂

σβ∂

∂l

0 = SSE

2

1- = (2)

22

2 β∂

∂

σβ∂

∂l

0 = ˆ

1

2

n - )ˆ(

2

SSE = (3)

2

2-2

2σ

σσ∂

∂l

LogL

β1

β2

1

2

II 12

Results:

• β 1 and β 2 (the MLE) are also the OLS estimators β1 and β2 when (A.1) – (A.5).

• ( )

22

t 1 22 t

ˆ ˆYeˆ

n n

− β − β σ = =

∑ ∑

= average of square vertical deviations is the MLE of σ2

• 2σ is biased.

s2 = Σet2/(n - 2) is an unbiased estimator of σ2. The reason 2σ is biased is that

not all of the et's are independent. Recall that there are two constraints on the

et's:

Σet = 0

ΣetXt = 0;

hence, (n – 2) of the residuals (estimated errors) are independent. In other words, if we had (n-2) of the et's, we could solve for the remaining two using the two constraints above.

3. Important observation:

If the assumptions (A.1) - (A.5) are not satisfied, we may be able to "do better" than least squares. It is important to test

the validity of (A.1) - (A.5).

II 13

i

2ˆi i

ˆ ~ N ;β

β β σ

D. DISTRIBUTION OF 1β AND 2β .

1. Distribution

In this section we give, without proof, the distribution of the least squares estimators if (A.2)-(A.5) hold. We also consider factors impacting estimator precision and finally provide some simulation results to provide intuition to the distributional results. The main results are then summarized. The proofs will be given in the next chapter using matrix algebra.

1β and 2β are linear functions of the 't

Y s are random variables; hence, 1β and 2β are

random variables.

Expected Value: (unbiased estimators)

E( 1β ) = β1

E( 2β ) = β2

Variance (Population)

2

222 2

ˆ t = / ( - X = )Xn (X)Var

β

σσ σ ∑

( )1

222 2ˆ t = 1/n + / ( - X)XXβ ∑σ σ

σσ β2ˆ

22

2X +/n =

1β and 2β are consistent because they are unbiased and their variances approach zero as

the sample size increases. Furthermore, if (A.1) holds (εt ~ N(0, σ2)), then Yt ~ N[β1+β2Xt;σ

2], which implies the

iβ 's will be normally distributed since they will be linear combinations of normally

distributed variables.

These results can be summarized by stating that if (A.1)-(A.5) are valid, then

where the equations for the variances are given above.

II 14

2. What factors contribute to increased precision (reduced variance) of parameter

estimators?

Consider the density of β 1 and recall that

1

2222 2 2

ˆ t

1 1 X = ( + / ( - X ) = + .)XXn n n (X)Var

βσ σ σ

∑

Precise Less Precise

Var(X)

n

σ

II 15

3. Interpretation of β i ~ N[βi; ]2ˆ

iσβ using Monte Carlo Simulations

In this section we report the results of some Monte Carlo simulations which provide

additional intuition about the distribution of iβ . We first construct the model used to

generate the data and then generate the data. Parameter estimates are then obtained, another sample is generated and the process is continued until we can consider the histograms of the estimators. Most Monte Carlo studies are similar in structure.

Consider the simple model which is referred to as the data generating process (DGP)


= 4 + 1.5Xt + εt

where εt ~ N(0, σ2 = 4). We will let the X's be given by

Xt = 1, 2, . . ., 20. The selection of 1β , 2β , 2σ , and the X’s are arbitrary.

We then generate 20 random disturbances (ε) using a random number generator for

N(0, σ2 = 4).

The X's and ε's are then substituted into

Yt = 4 + 1.5Xt + εt

to determine corresponding Y's. We now have 20 observations on Xt and Yt.

Pretend that we don't know what β1, β2, σ

2 are. The only thing we observe are the (Xt,

Yt). This might be visualized as

X → β1, β2, σ2, ε → Y

We now estimate the unknown parameters (β1, β2, σ

2) using the previously discussed

formulas. This could yield, for example:

( β 1, β 2, σ2) = (3.618, 1.615, 2.499).

If 14 more samples were generated, we would have a total of 15 estimates of β1, β2, σ

2.

II 16

The results of these random simulations are given by:

Trial β 1 1

2

βs β 2

2

2

βs s2 R2 D.W.*

________________________________________________________________________ 1 3.618 .539 1.615 .00372 2.499 .974 2.14 2 3.794 .992 1.494 .00689 4.599 .947 2.32 3 5.770 .826 1.346 .00578 3.838 .946 2.10 4 3.491 .646 1.516 .00449 2.997 .966 2.41 5 4.443 .566 1.438 .00397 2.623 .967 2.20 6 4.697 .968 1.491 .00672 4.486 .948 2.83 7 5.428 .504 1.363 .00348 2.333 .967 2.40 8 4.685 .923 1.394 .00672 4.278 .944 1.73 9 6.122 .653 1.337 .00449 3.025 .956 2.21 10 2.589 .885 1.624 .00624 4.100 .960 1.63 11 4.046 1.447 1.514 .01000 6.707 .927 3.35 12 4.384 1.362 1.488 .00941 6.314 .928 1.32 13 3.452 .797 1.594 .00563 3.693 .962 2.06 14 4.301 .598 1.495 .00423 2.770 .968 1.51 15 3.196 .910 1.566 .00640 4.221 .955 2.17 Average 4.27 .8411 1.485 .0059 3.8989 .954 2.16 *D.W. denotes Durbin Watson statistic which can be used to test the validity of (A.4).

Given that ( )2

1

n

t

t

X X=

−∑ = 665.

Questions:

(1) Evaluate the population variance of β 1 and β 2; i.e., . , 2ˆ

2ˆ

21σσ ββ

(2) Compare the average of 1

2

βs and

2

2

βs with their population counter-parts obtained in (1).

(3) Evaluate the sample variance of the fifteen estimates of β 1 and β 2 and compare them

with their population counterparts.

(4) Use a chi-square test to determine whether the average of the s2's is consistent with

σ2 = 4. Hint:

22

2

n- 2 ~ (15(18) = 270)s

∑ χ

σ .

II 17

A histogram of the estimated 1β 's might yield a result similar to the following:

Note the relationship between the histogram and the normal density .) ,N( 2ˆ1 1

σβ β

In practice we only have one sample of X's and Y's; hence, we only have one

observation of 1β , 2β , σβior sˆ

iβ and these distributional results must be interpreted

accordingly.

4. Review:

Model: Yt = β1 + β2Xt + εt

A.1 εt is distributed normally

A.2 E(εtXt) = 0

A.3 Var(εt) = σ2 ∀t

A.4 Cov(εtεs) = 0 t ≠ s

A.5 The X's are nonstochastic and 2

1

0 lim ( )n

tn

t

X X→∞

=

< − < ∞∑ .

Unknown parameters: β1, β2, σ

2

Problem: Given a sample of size n: (X1,Y1), . . ., (Xn,Yn), obtain estimators of the unknown parameters.

Estimators of the unknown parameters are given by:

1β

4

II 18

Parameter Estimator

β1: β 1 = Y - β 2 X

β2: )X - X(

)Y - Y)(X - X( = ˆ

2t

tt

2 ∑

∑β

Var(X)

Y)Cov(X, =

Xn - X

YXn - YX =

22t

tt

∑

∑

σ2: 2 -n

)Xˆ - ˆ - Y( = 2) -/(n e = s

2t21t2

t2 ββ∑

∑

Distributions:

1

222 2 2ˆ t11

ˆ ~ N[ , = /n + / ( - X ])XXβ ∑β σ σ σβ

2

22 2ˆ t22

ˆ ~ N[ , = / ( - X ])Xβ ∑β σ σβ

The covariance between β 1, and β 2 is given by

)X - (X/X- = )ˆ var(X- = X- = 22

2

2ˆˆˆ

221∑σβσσ βββ and will be proven later.

The σβ2ˆ

iare estimated by

)X - X(/sX + n

s = s

2t

222

2ˆ1

∑β

.)X - X(/s = s2

t22

ˆ2

∑β

It should be mentioned that

1 2

1 2

22 22ˆ ˆ t t 21 2

2 2 2 2ˆ ˆ

ˆ ˆ(n- 2) (n- 2) ( - - )s s(n- 2) Y Xs = = = ~ (n- 2)

β β

β β

∑ β βχ

σ σ σ σ

II 19

E. DESCRIPTIVE STATISTICS AND HYPOTHESIS TESTS

In this section we assume that (A.1)-(A.5) are valid and consider test statistics which can

be used to test whether the model has any explanatory power. Z and t statistics and R2 (the

coefficient of determination) are important tools in this analysis. An important hypothesis is

whether the exogenous variable X helps explain Y. Normally, we would hope to reject the

hypothesis H0: β2=0 (Yt=β1+εt). We also consider how to test more general hypotheses of the

form H0: βi=β0

i .

1. , = :H0

ii0 ββ where σβ2ˆ

iis known

i

0

ii

ˆ

ˆ - Z = ~ N(0,1)

β

ββ

σ

The test statistic measures the number of standard deviations that iβ differs from the

hypothesized value. Large values provide the basis for rejecting the null hypothesis. The

critical value is 1.96 for a two tailed test at the 5% level.

2. , = :H0

ii0 ββ where σβ2ˆ

iis unknown

ii

0 0

i ii i

2ˆˆ

ˆ ˆ - - t = = ~ t(n - 2)

ss ββ

β ββ β

Note the structure of the t-statistic and the Z-statistic are the same, except the standard

error in the Z-statistic is replaced by an unbiased estimator. sβ i

would, in some sense, get

closer to σβ i

as the sample size increases. We see this as we compare critical values for

the t- and Z-statistics.

II 20

Relationship between t- statistics and the standard normal

90% 95% 99% N(0,1) 1.645 1.960 2.326 t(1) 6.314 12.706 31.821 2 2.920 4.303 6.965 3 2.353 3.182 4.541 4 2.132 2.776 3.747 10 1.812 2.228 2.764 25 1.708 2.060 2.485

∞ 1.645 1.960 2.326 = N(0,1)

Note that the critical values for a t-statistic are larger than for a standard normal, because

the t density has thicker tails.

II 21

Confidence Intervals and t-statistics:

We note, from the following, the close relationship between the t-statistic just discussed

and confidence intervals.

)t < s

- ˆ < tPr(- 2/

ˆ

0

ii2/

i

α

β

α

ββ

)st + ˆ < < s t - ˆPr( = ˆ2/iiˆ2/1 ii βαβα βββ

= 1 - α

Thus, the use of confidence intervals or "test statistics" are just two different ways of looking at the same problem.

II 22

3. Coefficient of Determination (R2)

The coefficient of determination measures the fraction of the total sum of squares "explained" by the model. The following figure will provide motivation and definition of important terms.

Define the total sum of squares (SST) to be

)Y - Y + Y - Y( = )Y - Y( = SST2

ttt2

t

t

∑∑

+ )Y - Y( + )Y - Y( = 2

t

2

tt ∑∑ cross products = 0 if

least squares is used

)Y - Y( + e = 2

t2t ∑∑

= SSE + SSR,

where SSE and SSR, respectively, denote the sum of squared errors and sum of squares

explained by the regression model.

• total sum of squares = sum of squared errors + sum of squares "explained"

by regression model.

• SST = SSE + SSR

The coefficient of determination (R2) is defined by

SST

SSE - 1 =

SST

SSR = R

2

t t tˆe Y Y= −

tY Y−

t 1 2 tˆ ˆY X= β + β

II 23

)Y - Y(

e - 1 =

2t

2t

∑

∑= fraction of total sum of squares "explained" by the model.

Note that increasing the number of independent variables in the model will not change SST,

but will decrease the SSE as long as the estimated coefficient of the new variable(s) is not

equal to zero; hence, increase R2. This is true even if the additional variables are not

statistically significant. This has provided the motivation for considering the adjusted R2 ( 2R )

instead of R2. The adjusted 2R is defined by

1) -/(n )Y - Y(

K)/(n-)e( - 1 = R 2

t

2t2

∑

∑

where K = the number of β's (coefficients) in the model. R 2 will only increase with the

addition of a new variable if the associated t-statistic is greater than 1 in absolute value. This

results follows from the equation

( )( )

_ var

22

_ var2 2

ˆ

ˆ 0( 1)1

1New

NewNewNew Old

n SSER R

n k n K SST sβ

β −− − = −

− − −

where the last term in

the product is ( )2 1t − and K denotes the number of coefficients in the “old” regression model

and the “new” regression model includes K+1 coefficients.

4. Analysis of Variance (ANOV)

We have just decomposed the total sum of squares (SST) into two components:

• sum of squares error (SSE)

• sum of squares explained by regression (SSR).

This decomposition is commonly summarized in the form of an analysis of variance

(ANOV) table.

Source of Variation

SS

d.f

MSE

Model Error

SSR SSE

K - 1 n – K

SSR/(K-1) SSE/(n - K)

Total

SST

n – 1

K = number of coefficients in model

II 24

where SS denotes the sum of squares and degrees of freedom, d.f., is the number of independent terms in SS. The mean squared error (MSE) is the corresponding sum of squares (SS) divided by the degrees of freedom.

Dividing the MSE for the model by the MSE for the error (s2) gives an F-statistic:

KSSE/n-

1)SSR/(K- = F

2

2

n- K R= ~ F(K - 1, n - K)

K-1 1- R

The F-statistic can be used to test the hypothesis that all non-intercept (slope) coefficients

are equal to zero.

In the case of a single exogenous variable,

t 1 2 t tY = β X+ β + ε

the F statistic ( )2

2

n-2 R~ F 1, n 2

1 1-R

−

tests the hypothesis 0H : β2 = 0 (all non-intercept coefficients = 0).

II 25

5. Sample Stata regression output (general format and a numerical example)

sum lwage educ Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------

lwage | N sample mean lwage

s smallest value largest value

educ | N sample mean educat

s smallest value largest value

. reg lwage educ

ANOVA (Analysis of Variance Table)

Source | SS df MS Number of obs = N -------------+------------------------------ F( #coef-1, N-#coeff) = Model | SSR #coef-1 SSR/(#coeff-1) Prob > F = 0.0000 Residual | SSE N-#coef SSE/(N-#coeff) R-squared = SSR/SST = 1- SSE/SST

-------------+------------------------------ Adj R-squared = /( # )

1/( 1)

SSE N coeff

SST N

−−

−

Total | SST N-1 SST/(N-1) Note: 2 2, R

#

SSEs MSE s s

N coeff= = =

−

Regression results

------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -----------------------------------------------------------------------------

educ | 2β 2

ˆsβ

2

2

ˆ

ˆ

s β

β

Probability of a larger t stat. ( )ˆ/ 2ˆ /

ii t sα β

β + −

_cons | 1β 1ˆs

β 1

ˆ1

ˆ

s β

β

Same as above

sum lwage educ Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- lwage | 428 1.190173 .7231978 -2.054164 3.218876 educ | 753 12.28685 2.280246 5 17 . reg lwage educ Source | SS df MS Number of obs = 428 -------------+------------------------------ F( 1, 426) = 56.93 Model | 26.3264193 1 26.3264193 Prob > F = 0.0000 Residual | 197.001022 426 .462443713 R-squared = 0.1179 -------------+------------------------------ Adj R-squared = 0.1158 Total | 223.327441 427 .523015084 Root MSE = .68003 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .1086487 .0143998 7.55 0.000 .0803451 .1369523 _cons | -.1851968 .1852259 -1.00 0.318 -.5492673 .1788736

II 26

F. FORECASTS

If we have determined that our model has significant explanatory power, we may want to use it

to obtain predictions. We turn to constructing predictions or forecasts and confidence intervals

for the (1) regression line (or mean Y corresponding to a given X) and (2) individual value of

Y corresponding to an arbitrary value of X.

Sample: (Xt, Yt), t = 1, 2, . . ., n

Estimators: β 1, β 2

Sample Regression Line: Y t = β 1 + β 2Xt

Uncertainty about β 1, β 2 implies uncertainty about Yt.

E( Y t) = β1 + β2Xt

σσσ ββββ ˆˆt2ˆ

2t

2ˆt 2121

X2 + X + = )YVar(

σσσσ

βββ2

)ˆt2ˆ

2t

2ˆ

22

212X(-X2 + X + X +

n =

σσ

β2ˆ

2t

2

2)X - X( +

n =

σ2

Yt =

Therefore,

β β σt

2ˆtt 1 2 Y

ˆ ~ N( + ; ).XY

σ2

Ycan be estimated by s)X - X( +

n

s = s

2ˆ

2t

22

Y 2β

From these results we can construct

sample period

II 27

Confidence Intervals for β1 + β2Xt: (regression line or E(YX))

The forecasting problem is more often concerned with finding confidence intervals for the

actual value of Yt (not E(YtXt)) rather than the “mean” or expected value Yt corresponding to

an arbitrary value of Xt. To do this we consider an analysis of the forecast error (FE):

FE = Yt - tY

E(FE) = 0

σFE2 = Var(FEX)

= Var(Yt) + Var( Y )

= σ σ2 2

Y +

due to due to the error uncertainty about term population regression line

with σFE2 being estimated by sFE

2 = 2

Ys ˆ + s2

Note that Y

s ˆ and sFE are functions of ( )2

X X− , i.e., the further X is from the mean value, the

larger Y

s ˆ and sFE. This can also be seen in the following figure.

t c Y

1 2 t c Y

Y t s

X t s

±

β + β ±

ˆ

ˆ

ˆ

ˆ ˆ

where tc = tα/2(n-2).

II 28

Confidence Intervals (CI) for actual Yt: (not β1 + β2Xt)

where 2FE FE = s s

s + s = 2

Y

2

2

22 22

ˆts

= + + ( - X ) sXsn

β

The two curved lines closest to the sample regression line correspond to CI’s for the population

regression line and the two curved lines furthest from the sample regression line are the CI’s for the actual value of Y corresponding to different values of X.

G. ESTIMATION USING Stata

These calculations can be very tedious for even moderate sample sizes. Fortunately,

calculators and many computer programs make this part of econometrics relatively painless,

even exciting. Thus, we will be able to focus on understanding the statistical procedures, the

validity of the assumptions, and interpreting the statistical output. We will outline the

commands used in least squares estimation using the program Stata. Extensive manuals and

abbreviated information are also available describing additional procedures and options are

available for Stata and other programs such as SAS, EVIEWS, Gretl, R, SHAZAM and

TSP. Gretl is quite user friendly and it is free.

Stata

The data files can be created with Microsoft Excel (saving the file as a csv file). Stata

will automatically read in any column headings the data have. With a file named

FUN388.CSV, we can easily perform least squares estimation of the relationship

ts Y FEt±

1 2 tC.I. for X

(inner intervals)

β + β

tC.I. for Y

(outer intervals)

II 29


using the commands: . insheet using "C:\FUN388.CSV”, clear This reads the data into STATA.

This can also be done by opening the data editor and manually pasting the data.

. sum Y X Gives statistical characteristics of Y and

X. . plot Y X Plots Y on vertical axis, X on the

horizontal axis . reg Y X Uses OLS to estimate the given model To view additional residual diagnostics, use the following commands: After the “. reg Y X” command, type . predict error, resid (the variable “error” now contains the estimated

residuals)

1. To test for normality of the errors, type . sktest error Tests for normality using a skewness/kurtosis test. OR . swilk error Tests for normality using a Shapiro-Wilk test OR . sfrancia error Tests for normality using a Shapiro-Francia test. OR . qnorm error Displays plot of error against quantiles of normal

distribution. OR . findit jb The “findit” command is useful in Stata to find

commands that are not yet installed. “findit jb” will find the command for a Jarque-Bera test for normality. After installing the command, type “jb error” to run a Jarque-Bera test.

2. To test for heteroskedasticity, the following post-regression commands are useful:

. whitetst tests for heteroskedasticity using White’s test.

II 30

. estat hettest varnames tests for heteroskedasticity using a Breusch-Pagan and Cook and Weisberg test.

. estat hettest, rhs iid or fstat uses all rhs var’s and a chi squre or f-test . estat imest, preservewhite tests for heteroskedasticity (using White’s test)

and for skewness and kurtosis. More post-esimation commands are explained in the STATA help file titled

“regress postestimation.”

3. To test for autocorrelation (serial independence or randomness) of the error terms you must first declare your data to be time series with the command

. tsset timevar timevar is the name of the time variable in your dataset.

You can then test for autocorrelation in your time series data with the commands

. estat dwatson tests for first-order autocorrelation. . estat bgodfrey Breusch-Godfrey test for higher-order serial

correlation. . estat archlm tests for ARCH effects in the residuals.

. runtest varname varname is the name of the variable being tested for random order. 4. Some other options:

a. To calculate the sum of absolute errors (SAE), type

. egen SAE = sum(abs(error)) “SAE” will appear as a constant column in the data editor.

b. To view information criteria, including the log-likelihood value and the Akaike and Schwarz Bayesian information criteria, type

. estat ic

c. To display the variance covariance matrix, type . estat vce

d. To display the correlation matrix, type . estat vce, corr

e. Help files – use the Help menu or type HELP KEYWORD

II 31

Sample Stata output corresponding to the Anscombe_A data set in problem 1.2 (#4)

. infile x y using "C:\anscombe_a.txt", clear

(11 observations read)

. list y x

+------------+

| y x |

|------------|

1. | 8.04 10 |

2. | 6.95 8 |

3. | 7.58 13 |

4. | 8.81 9 |

5. | 8.33 11 |

|------------|

6. | 9.96 14 |

7. | 7.24 6 |

8. | 4.26 4 |

9. | 10.84 12 |

10. | 4.82 7 |

|------------|

11. | 5.68 5 |

+------------+

. plot y x

10.84 +

| *

|

|

| *

|

|

| *

|

| *

y | *

| *

| *

| *

|

|

| *

|

|

| *

4.26 + *

+----------------------------------------------------------------+

4 x 14

II 32

. sum y x

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

y | 11 7.500909 2.031568 4.26 10.84

x | 11 9 3.316625 4 14

. reg y x

Source | SS df MS Number of obs = 11

-------------+------------------------------ F( 1, 9) = 17.99

Model | 27.5100011 1 27.5100011 Prob > F = 0.0022

Residual | 13.7626904 9 1.52918783 R-squared = 0.6665

-------------+------------------------------ Adj R-squared = 0.6295

Total | 41.2726916 10 4.12726916 Root MSE = 1.2366

------------------------------------------------------------------------------

y | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

x | .5000909 .1179055 4.24 0.002 .2333701 .7668117

_cons | 3.000091 1.124747 2.67 0.026 .4557369 5.544445

------------------------------------------------------------------------------

. whitetst

White's general test statistic : .6998421 Chi-sq( 2) P-value = .7047

. estat hettest

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity

Ho: Constant variance

Variables: fitted values of y

chi2(1) = 0.41

Prob > chi2 = 0.5232

. estat ic

------------------------------------------------------------------------------

Model | Obs ll(null) ll(model) df AIC BIC

-------------+----------------------------------------------------------------

. | 11 -22.88101 -16.84069 2 37.68137 38.47717

------------------------------------------------------------------------------

*ll(model) corresponds to the optimized log-likelihood value to the specified model; whereas, ll(null) is obtained by estimating the model without any explanatory variables. Twice the difference of the log-likelihood values is distributed as a chi square with df equal to the number of explanatory variables.

II 33

H. FUNCTIONAL FORMS

In many applications the relationships between variables are not linear. A simple test for the presence of nonlinear relationships is the Regression Specification Error Test (RESET–Ramsey, 1969). This test can be performed as follows:

Ho: t t ty X β ε= + (estimate a linear model)

Ha: 2 3

1 2ˆ ˆ

t t t t ty X y yβ δ δ ε= + + (the y ’s denote OLS predicted values)

An F test of the hypothesis that both delta coefficients are simultaneously equal to zero is approximately distributed as an F(2, N-K). Alternatively nonlinear functions of x can be added to the linear terms and test for the collective explanatory power of the non-linear terms. Box-Cox transformations provide another approach.

The linear regression model just considered is more general than might first appear.

Many nonlinear models can be transformed so that "linear techniques" can be used.

We can consider two types of nonlinear models:

o transformable types--estimable by least squares

o nontransformable--use nonlinear optimization algorithms

1. Transformable Models

a. Log-Log or Double Log Model

t t tY AXβ= ε

The slope and elasticity are given by:

-1dY =

dXA X ββ

Y X

dY X = =

dX Yβη •

•

0β =

β =

0 1< β <

β =

1β >

β =

0β <

β =

II 34

This model can be estimated using least squares by taking the logarithm of the model to yield

ln Yt = ln A + β ln Xt + ln εt

= β1 + β2 lnXt + ln εt

where β1 = lnA and β2 = β . Regressing ln(Yt) on ln(Xt) gives estimates for β1 and

β2; hence 1A eβ=ˆˆ and 2

ˆ ˆβ = β .

b. Semi Log Models

(1)tX

tY = A Bt

ε

The slope and elasticities are given by

B;ln Y = dX

dYt

Y X = X ln Bη •

Estimation: Least squares can again be applied to the logarithmic transformation of the original model.

ln Yt = ln A + (ln B)Xt + ln εt = β1 + β2 Xt + ln εt.

Hence 1A eβ=ˆˆ and 2B eβ=

ˆˆ .

(2) tY

tX = A Bt

ε

The slope and elasticity are given byBln X

1 =

dX

dY

and ηY⋅X = 1/(Y ln B).

0 < B < 1

B = 1

B > 1

II 35

Estimation: Applying least squares to

ε

ttt ln

Bln

1 - Xln B)/ln (1 + BA/ln ln - = Y

= β1 + β2 ln Xt + ηt which yields

B = e1/ β 2 and A = e

- β 1/ β 2. c. Reciprocal Transformations

Yt = A + B/Xt + εt

The slope and elasticity are:

B/YX- = ;XB/- = dX

dYXY

2ε •

and

.B/XY- =

XYη •

β > 0

β < 0

II 36

Estimation: Let Z = 1/X, then estimate Yt = A + BZt + εt

= β1 + β2Z + εt and 1A = βˆ ˆ and 2B = βˆ .

d. Logarithmic Reciprocal Transformations

Yt = eA-B/X+εt

B/X = ;XBY/- = dX

dYXY

2 η •

Estimation: This model can be estimated using least squares on

ln Yt = A - B/Xt + εt

= β1 + β2(1/X) + εt where A = α = β 1 and

B = - β 2.

Application:

α = 0 market share

asymptotic level

II 37

e. Polynomials

y = β1 + β2x + β3x2 + β4x

3

β3 = β4 = 0 β4 = 0 β4 ≠ 0

Cost Function:

TC(q) = β1 + β2q + β3q2 + β4q

3

MC(q) = β2 + 2β3q + 3β4q2

• the desired shape requires β4 > 0

• a minimum for positive q requires

MC'(q) = 2β3 + 6β4q = 0 q = -2β3/6β4 > 0 β3 < 0

• minimum MC > 0 requires

4 2

3β - 4β2 3β4 < 0 2

3β < 3β2 β4

β2 > 0

Restrictions (Summary):

β1 ≥ 0, β2 > 0, β3 < 0, β4 > 0

2

3β < 3β2β4

II 38

f. Production Functions

(1) Cobb Douglas (CD) β ββ β ⋅ ε3 41 2+ t

t t t t = eY L K

ln Yt = β1 + β2t + β3 ln Lt + β4 ln Kt + ln εt

Production Characteristics:

β3 + β4 = 1 constant returns to scale

β3 = percent of total revenue paid to labor

= 1 = W/W%

(K/L)% =

KL∆

∆σ elasticity of substitution

(2) Translog Transformation

ln Yt = β1 + β2 ln Lt + β3 ln Kt

+ β3(ln Lt)2 + β4(ln Kt)

2 + β5(ln Lt)(ln Kt)

Note that this model includes the Cobb Douglas as a special case (β3=β4=β5=0).

(3) Constant Elasticity of Substitution (CES)

[ ] εδδ ρρ ρββttt

M/t+t K) - (1 + Le = Y 21

, - 1

1 =

ρσ

M = returns to scale. Cost functions can be estimated from estimated production functions.

Estimation: (?)

ln Yt = β1+β2t + M/ρ ln[δLρ+(1 - δ)kρ] + ln εt

This function is a "nontransformable" type.

2. "Nontransformable" Models

Problem: Estimate the parameters in

Yt = F(β1, β2, . . ., βs

; Xlt

, . . ., XKt

) + εt.

II 39

Two possible approaches include using nonlinear optimization programs or approximations.

(a) Nonlinear Optimization Approach

(1) Define the objective function

Min SSE or Maximum Likelihood

(2) Specify an initial guess for parameters. (3) "Press go."

Start at initial value and iterate to a solution.

(b) Examples:

(1) Logistic Model

[ ]εγβ

αδγ

tX+t

e + + = Y

t

Estimation:

δγ

β

α∑ X - - -

Yln = SSE t

t

2

= Σ(ln εt)2

(2) Constant elasticity of substitution (CES) production function

(3) Box Cox. Defineλ

λλ 1 - Y

= Y)(

Consider Y(λ)

= β1 + β2X(λ)

+ εt.

λ = 0: ln y = β1 + β2 ln X + εt

λ = 1: Y - 1 = β1 + β2(X - 1) + ε

or Y = 1 + β1 - β2 + β2X + ε Stata will estimate "Box-Cox" models with the command format

boxcox depvar [indepvars] [, options]

Options (list from help file “boxcox” in Stata).

model(lhsonly) applies the Box-Cox transform to depvar only.

model(lhsonly) is the default.

model(rhsonly) applies the transform to the indepvars only.

II 40

model(lambda) applies the transform to both depvar and indepvars, and they are transformed by the same parameter.

model(theta) applies the transform to both depvar and indepvars, but this

time, each side is transformed by a separate parameter.

notrans(varlist) specifies that the variables in varlist be included as nontransformed independent variables.

II 41

I. PROBLEM SETS

Problem Set 2.1

Simple Linear Regression

Theory

1. Let kids denote the number of children ever born to a woman, and let educ denote the years of

education for the woman. A simple model relating fertility to years of education is

kids = β0 + β1educ + u where u is the unobserved error.

a. All of the factors besides a woman’s education that affect fertility are lumped into the error term, u. What kinds of factors are contained in u? Which of these are likely to be correlated with level of education, which are not?

b. Will a simple regression analysis uncover the ceteris paribus effect of education on fertility? Explain.

(Wooldridge 2.1)

2. Demonstrate that

t t

22t

( - X)( - Y) Covariance( , )X Yˆ = ( )( - X)X

X Y

Variance Xβ

∑=

∑is equivalent to

a. )Xn( - X

YXn - YX22

t

tt

∑

∑

b. )X - X(

Y)X - X(2

t

tt

∑

∑

(Hints: Expand the numerator and denominator and remember that tX nX=∑ ).

c. If you only have two observations (n=2), ( ) ( )1 1 2 2( , , , )X Y X Y , demonstrate that the

equation for 2β can be simplified to 2 1

2 1

Y Yrise

run X X

−=

−.

(JM II-B, JM Math)

3. Demonstrate that the sample regression line obtained from least squares with an estimated

intercept passes through ( X , Y ). (Hint: 1 2ˆ ˆY Xβ β= + , substitute X X= , and simplify)

(JM II-B)

II 42

4. Consider the model Y

t = βX

t + ε

t, where

A.1 εt distributed normally

A.2 E(εt) = 0 ∀t

A.3 Var(εt) = σ2 ∀t

A.4 Cov(εt,ε

s) = 0 ∀t, s (t≠s)

A.5 Xt nonstochastic.

a) Find the least squares estimator of β. Hint: SSE = Σεt

2 = Σ(Yt - βXt)2.

b) Find the MLE of β and σ2.

Hint: l (Y; Β 2,β σ ) = Σ ln f(Yt; 2,β σ )

= )ln(2

n - )ln(2

2

n - 2/)X - Y(- 222

tt σΠ

σβ∑

c) Will the sample regression line ( )ˆˆt

Y Xβ= obtained in (a) or (b) pass through ( X , Y )?

Explain. (JM II-B)

Applied

5. The data set in CEOSAL2.RAW contains information on chief executive officers for U.S.

corporations. The variable salary is annual compensation, in thousands of dollars, and ceoten

is prior number of years as company CEO. i) Find the average salary and average tenure in the sample. ii) How many CEO’s are in their first year as CEO (that is, ceoten = 0)? iii) Estimate the simple regression model

log(salary) = β0 + β1ceoten + ε

and report your results in the usual form*. What is the predicted percentage increase in salary given one more year as CEO?

(Wooldridge C.2.2)

*The usual form is to write out the equation with the estimated betas and their standard errors underneath in parentheses. For example, if I was estimating

Yt = α + βX

t + ε

t and estimated α to be .543 with a standard error of .001 and β to be 1.43 with a standard error of 1.01 then I would report my results in the “usual form” as follows: Yt = .543 + 1.43*Xt R2 =.955

(.001) (1.01) N = 123. ** We will review the required Stata commands in class/TA sessions.

II 43

Problem Set 2.2

Simple Linear Regression

Theory

Consider the model

Yt = β

1 + β

2X

t + ε

t.

1. BACKGROUND: The purpose of this problem is to show that, using OLS, the total sum of

squares can be partitioned into two parts as follows:

)Y - Y + Y - Y( = )Y - Y( 2

ttt

n

1=t

2t

n

1=t

∑∑

)Y - Y( + )Y - Y)(Y - Y( 2 + )Y - Y( =2

t

n

1=t

ttt

n

1=t

2t

n

1=t

∑∑∑

where the terms )Y - Y( ,)Y - Y( ,)Y - Y( 2

t

n

1=t

2t

n

1=t

2t

n

1=t

∑∑∑ are referred to as the total sum of

squares (SST), sum of squares error (SSE), sum of squares "explained by the regression" (SSR), respectively. This notation differs from that used by Wooldridge, but conforms with notation used in a number of other econometrics texts

QUESTION: Explain why the cross product term

n n n

t tt tt t t 1 2t=1 t=1 t=1

ˆ ˆˆ ˆ ˆ( - )( - Y) = ( - Y) = ( + - Y) = 0e eY XY Y Y β β∑ ∑ ∑

when least squares estimators are used. (Remember the first order conditions or normal equations.)

(JM II-B)

Applied

2. For the population of firms in the chemical industry, let rd denote annual expenditures on

research and development, and let sales denote annual sales (both are in millions of dollars).

a. Write down a model (not an estimated equation) that implies a constant elasticity between rd and sales. Which parameter is the elasticity? (Hint: what functional form should be used?)

b. Now estimate the model using the data in RDCHEM.RAW. Write out the estimated equation in the usual form*. What is the estimated elasticity of rd with respect to sales? Explain in words what this elasticity means.

(Wooldridge C 2.5)

* report the estimated parameters, standard errors, and R2

II 44

3. Consider the following four sets of data1

Data Set A B C D

Variable X Y X Y X Y X Y

Obs. No. 1 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58

2 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76

3 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71

4 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84

5 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47

6 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04

7 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25

8 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50

9 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56

10 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91

11 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

a. For each of the data sets estimate the relationship

Yt = β

1 + β

2X

t +

tε

, using least squares.

b. Compare and explain the four sets of results. (Hint: plot the data.) c. In each of the four cases obtain a prediction of the value of Y

t corresponding to a value of X = 20.

Which of the forecasts would you feel most comfortable with? Explain. d. Based upon these examples comment on the following widely held notions.

i) "Numerical calculations are exact, but graphs are rough."

ii) "For any particular kind of statistical data there is just one set of calculations constituting a

correct statistical analysis."

iii) "Performing intricate calculations is rigorous, whereas actually looking at the data is cheating." (JM II)

1 Reference: Anscombe, F. J., "Graphs in Statistical Analysis," The American Statistician, Vol. 27 (1973), p. 17-21.

II 45

4. The following Stata printout corresponds to the first Anscombe data set.

a. From the printout, determine the values of the following:

X = 2

s =

ˆ2

2s

β=

b. Calculate the predicted value of Y and the variance of the forecast error

corresponding to x=20.

(1) Y =

(2) 2 2

Ys s+ =

(3) 2

Ys =

Hint: Recall that 2

22 2 2ˆ ˆ(20 )

Y

ss X s

n β

= + −

and 2

FEs = 2 2

Ys s+

c. Calculate 95% confidence intervals for the actual value of Y corresponding to X=20.

d. Calculate 95% confidence intervals for the population regression line corresponding

to X=20. Yet another hint: the sample and population regression lines, respectively,

are defined by ( )1 2ˆ ˆˆ t tY Xβ β+ and 1 2 t

Xβ β+ , so use Ys for part (d) and FE

s for part

(c).

Check your work: Recall that the confidence interval for the population regression line is narrower than the confidence interval for the actual value of Y corresponding to a given X.

5. Consider the attached data file (functional forms 2.dta). X denotes the independent variable, x=1,2,3, ..., 100. Corresponding to this independent variable, various dependent variables were generated. Plot and estimate an appropriate functional form between

a. the dependent variable denoted loglog and x; b. semilog1 and x; c. reciptrans and x; d. polya and x; e. polyb and x; and f. polyc and x.

II 46

STATA Output (for problem #4)

. infile x y using "anscombe_a.txt", clear

(11 observations read)

. summ y x

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

y | 11 7.500909 2.031568 4.26 10.84

x | 11 9 3.316625 4 14

. reg y x


-------------+------------------------------ F( 1, 9) = 17.99

Model | 27.5100011 1 27.5100011 Prob > F = 0.0022


-------------+------------------------------ Adj R-squared = 0.6295

Total | 41.2726916 10 4.12726916 Root MSE = 1.2366

------------------------------------------------------------------------------

y | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

x | .5000909 .1179055 4.24 0.002 .2333701 .7668117

_cons | 3.000091 1.124747 2.67 0.026 .4557369 5.544445

------------------------------------------------------------------------------

. set obs 12

obs was 11, now 12

. replace x=20 in 12

(1 real change made)

. predict yhat

(option xb assumed; fitted values)

. predict sfe, stdf

. list in 11/12 //only lists observations 11 and 12

+---------------------------------+

| x y yhat sfe |

|---------------------------------|

11. | 5 5.68 5.500546 1.375003 |

12. | 20 . 13.00191 1.830386 |

1II'

James B. McDonald Brigham Young University 1/7/2010

III. Classical Normal Linear Regression Model Extended to the Case of k

Explanatory Variables

A. Basic Concepts

Let y denote an n x l vector of random variables, i.e., y = (y1, y2, . . ., yn)'.

1. The expected value of y is defined by

M

1

2

n

E( )y

E( )yE(y) =

E( )y

2. The variance of the vector y is defined by

K

K

M M O M

1 1 2 1 n

2 1 2 2 n

n 1 n 2 n

Var( ) Cov( , ) Cov( , )y y y y y

Cov( , ) Var( ) Cov( , )y y y y yVar(y) =

Cov( , ) Cov( , ) Var( )y y y y y

NOTE: Let µ = E(y), then

Var(y) = E[(y - µ)(y - µ)']

µ µ

M

11

nn

-y

= E

-y

(y1 - µ1, . . ., yn - µn)

µ µ µ µµ

µ µ µ µµ

21 2 1 n1 2 1 n11

22 1 2 n2 1 2 n22

n

E( - )( - ) . . . E( - )( - )y y y yE( -y )

E( - )( - ) . . . E( - )( - )y y y yE( -y )

. . .=

. . .

. . .

E( y

µ µ µ µ µ

2n 1 n 21 n 2 nn

- )( - ) E( - )( - ) . . .y y y E( - y )

2II'

1 1 2 1 n

2 1 2 2 n

n 1 n 2 n

Var( ) Cov( , ) ... Cov( , )y y y y y

Cov( , ) Var( ) ... Cov( , )y y y y y

. . .= .

. . .

. . .

Cov( , ) Cov( , ) ... Var( )y y y y y

3. The n x l vector of random variables, y, is said to be distributed as a multivariate

normal with mean vector µ and variance covariance matrix ΣΣΣΣ (denoted y ~

N(µ,ΣΣΣΣ)) if the probability density function of y is given by

-11

- (y- ) (y- )2

n 1

2 2

ef(y; , ) = .

(2 |) |

′µ µ∑

µ ∑π ∑

Special case (n = 1): y = (y1), µ = (µ1), Σ = (σ2).

)()(2

e = ),;yf(

2

12

2

1

)-y(1

)-y(2

1-

11

11211

σπσµ

µσ

µ

. 2

e =

2

2

)-y-(2

211

σπ

σ

µ

4. Some Useful Theorems

a. If y ~ N(µy,Σy), then z = Ay ~ N(µz = Aµy; Σz = AΣyA') where A is a

matrix of constants.

b. If y ~ N(0,I) and A is a symmetric idempotent matrix, then y'Ay ~ χ2(m)

where m = Rank(A) = trace (A).

c. If y ~ N(0,I) and L is a k x n matrix of rank k, then Ly and y'Ay are

independently distributed if LA = 0.

d. If y ~ N(0,I), then the idempotent quadratic forms y'Ay and y'By are

independently distributed χ2 variables if AB = 0.

3II'

NOTE:

(1) Proof of (a)

(2) Example: Let y1, . . ., yn denote a random sample drawn from

N(µ,σ2).

The "Useful" Theorem 4.a implies that:

21 n

1 1 1 1y = + ... + = , . . . y ~ N( , /n)y y

n n n n

µ σ

.

Verify that

(a) µ

µ

µ

= n

1,...,

n

1

M

(b) ./n =

n

1

n

1

I n

1,...,

n

1 22σσ

M

E(z) = E(Ay) = AE(y) = Aµy

VAR(z) = E[(z - E(z))(z - E(z))']

= E[(Ay - Aµy)(Ay - Aµy)']

= E[A(y - µy)(y - µy)'A']

= AE[(y - µy)(y - µy)']A'

= AΣyA' =Σ z

σ

σ

µ

µ

=2

21

. . . 0

. .

0 . . .

,.

.N~

.

.y O

ny

y

4II'

B. The Basic Model

Consider the model defined by

(1) yt = β1xtl + β2xt2 + . . . + βkxtk + εt (t = 1, . . ., n).

If we want to include an intercept, define xtl = 1 for all t and we obtain

(2) yt = β1 + β2xt2 + . . . + βkxtk + εt.

Note that βi can be interpreted as the marginal impact of a unit increase in xi on the

expected value of y.

The error terms (εt) in (1) will be assumed to satisfy:

(A.1) εt distributed normally

(A.2) E(εt) = 0 for all t

(A.3) Var(εt) = σ2 for all t

(A.4) Cov(εtεs) = 0,t ≠ s.

Rewriting (1) for each t (t = 1, 2, . . ., n) we obtain

y1 = β1x11 + β2x12 + . . . + βkx1k + ε1

y2 = β1x21 + β2x22 + . . . + βkx2k + ε2

. . . .

. . . .

(3) . . . .

yn = β1xn1 + β2xn2 + . . . + βkxnk + εn.

The system of equations (3) is equivalent to the matrix representation

y = Xβ + ε

where the matrices y, X, β and ε are defined as follows:

5II'

y = Xβ + ε.

(A.1)' ε ~ N(0; Σ = σ2I)

(A.5)' The xtj's are nonstochastic and

x

n

X X = Limit n→∞

′ Σ

is nonsingular.

columns: n observations on k

individual variables.

rows: may represent

observations at a given point

in time.

11

22

nk

= and = .

β ε

β ε β ε β ε

M M

NOTE: (1) Assumptions (A.1)-(A.4) can be written much more

compactly as

(A.1)’ ε ~ N (0; Σ = σ2I).

(2) The model to be discussed can then be summarized as

11 1k1

21 2k2

n1 nkn (nxk)(nx1)

y x x

y x xy = X =

y x x

K

K

M M M

K

6II'

C. Estimation

We will derive the least squares, MLE, BLUE and instrumental variables estimators in

this section.

1. Least Squares:

The basic model can be written as

y = Xβ + ε

ˆ ˆ= Xβ + e = Y + e

where ˆY = Xβ is an nx1 vector of predicted values for the dependent variable and

e denotes a vector of residuals or estimated errors.

The sum of squared errors is defined by

n2t

t=1

ˆSSE(β) = e∑

e

e

e

)e , ,e ,e( =

n

2

1

n21

M

K

ee = ′

ˆ ˆ= (y - Xβ) (y - Xβ)′

ˆ ˆ ˆ ˆ= y y - β X y - y Xβ + β X Xβ′ ′ ′ ′ ′ ′

ˆ ˆ ˆ= y y - 2β X y + β X Xβ .′ ′ ′ ′ ′

The least squares estimator of β is defined as the β which minimizes ˆSSE (β). A

necessary condition for ˆSSE(β) to be a minimum is that

ˆdSSE(β) = 0

ˆdβ (see Appendix A for how to differentiate a real

valued function with respect to a vector) ˆdSSE(β) ˆ = -2X y + 2X Xβ = 0 or

ˆdβ′ ′

7II'

yX = ˆXX ′′ β

yX)XX( = ˆ -1 ′′β

Normal Equations

is the least squares estimator.

Note that β is a vector of least squares estimators of β1, β2,...,βk.

2. Maximum Likelihood Estimation (MLE)

Likelihood Function: (Recall y ~ N (Xβ; Σ = σ2I)) -11- (y-X ) (y-X )

22

1n/ 22

eL(y; , = I) =

(2 |) |

′β β∑

µ ∑ σπ ∑

2

1- (y-X ) (y-X )2

1n/ 2 2 2

e =

(2 | I) |

′β βσ

π σ

2(y-X ) (y-X ) / 2

nn 2 2 2

e = .

(2 () )

′β β σ

π σ

The natural log of the likelihood function,

σπσ

β′β 2

2ln

2

n - 2ln

2

n -

2

)X(y-)X(y- - = Lln = l

is known as the log likelihood function. l is a function of β and σ2.

The MLE. of β and σ are defined by the two equations (necessary conditions for a

maximum):

2

1 = (-2X y + 2(X X) ) = 0β 2

∆

∆

∂′ ′ β

∂ σ

l

2222

(y - X ) (y - X ) n 1 = - = 0

22( )

∆ ∆

∆∆

′∂ β β ∂ σ

σ σ

l

i.e.,

-1 = (X X X'y)

∆

′β

8II'

+π+=

nln)2ln(1

2

n-

SSEl

.

NOTE: (1) ˆ = ∆

β β

(2) 2∆

σ is a biased estimator of σ2; whereas,

2 1 (y - X ) (y - X ) SSE = e e = = s

n- k n - k n - k

∆ ∆

′β β′

is an unbiased estimator of σ2.

A proof of the unbiasedness of s2 is given in Appendix B.

Only n-k of the estimated residuals are independent. The

necessary conditions for least squares estimates impose k

restrictions on the estimated residuals (e). The restrictions

are summarized by the normal equations X'X β = X'y, or

equivalently

(3) Substituting σ2 = SSE/n into the log likelihood function

yields what is known as the concentrated log likelihood

function

which expresses the loglikelihood value as a function of β

only. This equation also clearly demonstrates the

equivalence of maximizing l and minimizing SSE.

X’e = 0

2t

12 = (y - X ) (y - X )n

e e e= =

n n

∆ ∆ ∆

′β β

′ ∑

σ

9II'

3. BLUE ESTIMATORS OF β, β .%

We will demonstrate that assumptions (A.2)-(A.5) imply that the best

(least variance) linear unbiased estimator (BLUE) of β is the least squares

estimator. We first consider the desired properties and then derive the associated

estimator.

Linear: Ay = ~β where A is a kxn matrix of constants

Unbiased: ββ AX = AE(y) = )~

E(

We note that βββ = XA = )~

E( requires AX = I.

Minimum Variance:

i iiVar( ) = Var(y) β A A′%

= σ2AiAi'

where Ai = the ith row of A and ii = yβ A% .

Thus, the construction of BLUE is equivalent to selecting the matrix A so that the

rows of A

Min AiAi' i = 1, 2, . . ., k

s.t. AX = I

or equivalently, min i

Var( )β% s.t. AX = I (unbiased).

The solution to this problem is given by

A = (X'X)-1X' ; hence, the BLUE of β is given by -1

= Ay (X X X y)′ ′β =% .

The details of this derivation are contained in Appendix C.

NOTE: (1)

(2) ( )1

AX X 'X X 'X I−

= = ; thus β% is unbiased

-1ˆβ = β = β = (X X X y)∆

′ ′%

10II'

4. Instrumental Variables Estimators

y = Xβ + ε

Let Z denote an n x k matrix of “instruments” or "instrumental" variables.

Consider the solution of the modified normal equations:

ZZ ' Y Z ' X ;= β% hence, ( )

1

zβ Z X Z y−

′ ′= .

zβ is referred to as the instrumental variables estimator of β based on the

instrumental variables Z. Instrumental variables can be very useful if the

variables on the right hand side include “endogenous” variables or in the case of

measurement error. In this case OLS will yield biased and inconsistent

estimators; whereas, instrumental variables can yield consistent estimators.

NOTE: (1) The motivation for the selection of the instruments (Z) is

that the covariance (Z,ε) approaches 0 and Z and X are

correlated. Thus Z'(Y) = Z'(Xβ + ε) = Z' X β + Z'ε≈ Z' Xβ.

(2) Ifn

Z XLim

n→∞

′

is nonsingular andn

Z = 0Lim

n→∞

′ε

, then

zβ is a consistent estimator of β.

(3) Many calculate an R2 after instrumental variables

estimation using the formula R2 = 1 – SSE/SST. Since this

can be negative, there is not a natural interpretation of R2

for instrumental variables estimators. Further, the R2 can’t

be used to construct F-statistics for IV estimators.

(4) If Z includes “weak” instruments (weakly correlated

with the X’s), then the variances of the IV estimator can

be large and the corresponding asymptotic biases can be

large if the Z and error are correlated. This can be

seen by noting that the bias of the instrumental variables

estimator is given by

E ( )1

' / ( ' / )Z X n Z nε−

.

(5) As a special case, if Z = X, then ∆

ˆˆ ˆ = = β = β = ββ βz x% .

11II'

(6) If Z is an x k* n matrix where k< k* (Z contains more

variables than X), then the IV estimator defined above must

be modified. The most common approach in this case is to

replace Z in the “IV” equation by the projections** of X on

the columns of Z, i.e. ( )1ˆ ' 'X Z Z Z Z X

−= .

This substitution yields the IV estimator

( ) ( )

1

11 1

ˆ ˆ' '

' ' ' ' ' '

IVX X X Y

X Z Z Z Z X X Z Z Z Z Y

β−

−− −

=

=

which yields estimates for k k* ≤ .

.

The Stata command for the instrumental variables estimator

is given by

ivregress 2sls depvar (varlist_1 =varlist_iv)

[varlist_2]

where estimator = 2sls, gmm, or liml with

2sls is the default estimator

for the model

1 2depvar = (varlist_1)b + var(list_2)b + error

where varlist_iv are the instrumental variables for varlist_1.

A specific example is given by:

ivregres 2sls y1 (y2=z1 z2 z3) x1 x2 x3

Identical results could be obtained with the command,

Ivregress 2sls y1 (y2 x1 x2 x3=z1 z2 z3)

which is equivalent to regressing all of the right hand side

variables on the set of instrumental variables. This can be

thought of as being of the form

ivregress 2sls y (X=Z)

**The projections of X on Z can be obtained by obtaining

estimates of

in the "reduced form" equation X Z VΠ = Π + to yield

( )1ˆ ' 'Z Z Z X

−Π = ; hence, the estimate of X is given by

( )1ˆ ˆ ' 'X Z Z Z Z Z X

−= Π =

12II'

D. Distribution of ∆

β, , ββ %

Recall that under the assumptions (A.1) – (A.5) y ~ N(Xβ, Σ = σ2I) and

-1β = β = β = (X X X y;)

∆

′ ′%

hence, by useful theorem (II.’ A. 4.a), we conclude that ∆

2yy

β = β = β ~ N(A A A ) = N[Ax , A IA ]′ ′βµ ∑ σ%

where A = (X'X)-1X'.

The desired derivations can be can be simplified by noting that

AXβ = (X'X)-1X'Xβ = β

σ2AA' = σ2(X'X)-1X'((X'X)-1X')'

= σ2(X'X)-1X'X((X'X)-1)'

= σ2((X'X)-1)'

= σ2((X'X)')-1

= σ2(X'X)-1.

Therefore ( )( )∆

12β = β = β ~ N β; X X−

′σ%

NOTE: (1) σ2(X'X)

-1 can be shown to be the Cramer-Rao matrix, the matrix

of lower bounds for the variances of unbiased estimators.

(2) ∆

β, , β,β % are

⋅unbiased

⋅consistent

.minimum variance of all (linear and nonlinear unbiased

estimators

⋅normally distributed

13II'

(3) An unbiased estimator of σ2(X'X)-1 is given by

s2(X'X)-1

where s2 = e'e/(n-k) and is the formula used to calculate the

"estimated variance covariance matrix" in many computer

programs.

(4) To report s2(X'X)-1 in STATA type

. reg y x

. estat vce

(5) Distribution of the variance estimator

χσ

22

2

(n - k)s ~ (n - k)

NOTE: This can be proven using the theorem (II'.A.4(b)) and noting that 2 ˆ ˆ(n- k) = e e = (Y - Xβ) (Y - Xβ) .s ′ ′

-1

= (X + ) (I - X(X X X )(X + ))′ ′ ′β ε β ε

= ε'(I - X(X'X)-1X')ε.

Therefore,2

-1

2

(n- k)s = (I - X(X X X ))

′ε ε ′ ′

σ σ σ

= M ′ε ε

σ σ

where ~ N [0, I].ε

σ

hence

22

2

(n- k)s ~ (n- k) becauseχ

σ

M is idempotent with rank and trace equal to n - k.

14II'

E. Statistical Inference

1. Ho: β2 = β3 = . . . = βk = 0

This hypothesis tests for the statistical significance of overall explanatory power

of the explanatory variables by comparing the model with all variables included to

the model without any of the explanatory variables, i.e., yt = β1 + εt (all non-

intercept coefficients = 0). Recall that the total sum of squares (SST) can be

partitioned as follows:

)y - y( + )y - y( = )y - y( 2

t

N

1=t

2

tt

N

1=t

2

t

N

1=t

∑∑∑ or

SST = SSE + SSR.

Dividing both sides of the equation by σ2 yields quadratic forms, each having a

chi-square distribution:

2 2 2

SST SSE SSR = +

σ σ σ

χ2(n - 1) = χ2(n - k) + χ2(k - 1).

This result provides the basis for using

to test the hypothesis that β2 = β3 = . . . = βk = 0.

NOTE: (1)R - 1

R =

SST

SSR - 1

SSR/SST =

SSR - SST

SSR =

SSE

SSR2

2

hence, the F-statistic for this hypothesis can also be rewritten as

Recall that this decomposition of SST can be summarized in an ANOVA table as

2

2

SSR(K -1)(n- K)K 1F = = ~ F(K - 1, n - K)

SSE (n- K)(K -1)

n K

χ−χ

−

2

2

2 2

Rn - k Rk - 1F = = ~ F(k - 1,n - k).

(1 - ) /(n - k) k - 1 1 - R R

15II'

follows:

Source of Variation

SS

d.f

MSE

Model

Error

SSR

SSE

K - 1

n – K

SSR/(K-1)

SSE/(n - K) 2s=

Total

SST

n – 1

K = number of coefficients in model

where the ratio of the model and error MSE’s yields the F statistic just discussed.

Additionally, remember that the adjusted R2 ( 2R ), defined by

22 t

2t

( ) /(n- K)e = 1 - ,R

( - Y /(n - 1))Y

∑

∑

will only increase with the addition of a new variable if the t-statistic associated with

the new variable is greater than 1 in absolute value. This result follows from the

equation

( )( )

_ var

22

_ var2 2

ˆ

ˆ 0( 1)1

1New

NewNewNew Old

n SSER R

n k n K SST sβ

β −− − = −

− − −

where the last

term in the product is ( )2 1t − and K denotes the number of coefficients in the “old”

regression model and the “new” regression model includes K+1 coefficients.

The Lagrangian Multiplier (LM) test can also be used to test this hypothesis

2 2~ ( 1)aLM NR kχ= −

16II'

2. Testing hypotheses involving individual βi's

Recall that

-1

β ~ N (β; σ (X X ))′

where

( )

2ˆ ˆ ˆ ˆ ˆβ β β β β1 1 2 1 k

2ˆ ˆ ˆ ˆ ˆβ β β β β2 1 2 2 k12

2ˆ ˆ ˆ ˆ ˆβ β β β βk 1 k 2 k

X X−

′σ =

σ σ σσ σ σ

σ σ σ

L

M O

which can be estimated by

( )

2ˆ ˆ ˆ ˆ ˆβ β β β β1 1 2 1 k

2ˆ ˆ ˆ ˆ ˆβ β β β β2 1 2 2 k12

2ˆ ˆ ˆ ˆ ˆβ β β β βk 1 k 2 k

s s s

s s ss X X

s s s

−

′ =

L

M O

Hypotheses of the form H0: βi = 0iβ can be tested using the result

The validity of this distributional result follows from

2

N(0,1) ~ t(d)

(d) /dχ

since

i

ii

β

ˆ - ββ ~ N(0,1) and

σ

i

i

22

β2

β

(n - k) ~ (n - k).χs

σ

i

0

ii

β

ˆ - ββ ~ t(n - k)

s

17II'

3. Tests of hypotheses involving linear combinations of coefficients

A linear combination of the βi's can be written as

1k

i 1 ki

=1

k

β

= ( ,..., ) = δ β.βδ δ δ

β

′

∑l

M

We now consider testing hypotheses of the form

Recall that

-12β ~ N (β; (X X ) ;)σ ′

therefore,

-12ˆδ β ~ N (δβ; δ (X X δ))σ′ ′ ′

hence, '

' ' '

-1 2' 2ˆδ β

ˆ ˆδβ - δβ δβ - γ = ~ t(n - k).

δ (X,X δ) ss

The t-test of a hypothesis involving a linear combination of the coefficients

involves running one regression and estimating the variance of ˆδ β′ from s2(X'X)-1

to construct the test statistics.

4. More general tests

a. Introduction

We have considered tests of the overall explanatory power of the

regression model (Ho: β2 = β3 = . . . βk = 0), tests involving individual parameters

(e.g., Ho: β3 = 6), and testing the validity of a linear constraint on the coefficients

H0: δ'β = γ.

18II'

(Ho: δ’β = γ). In this section we will consider how more general tests can be

performed. The testing procedures will be based on the Chow and Likelihood

ratio (LR) tests. The hypotheses may be of many different types and involve the

previous tests as special cases. Other examples might include joint hypotheses of

the form: Ho: β2 + 6 β5 = 4, β3 = β7 = 0. The basic idea is that if the hypothesis is

really valid, then goodness of fit measures such as SSE, R2 and log-likelihood

values (l) will not be significantly impacted by imposing the valid hypothesis in

estimation. Hence, the SSE, R2 or l values will not be significantly different for

constrained (via the hypothesis) and unconstrained estimation of the underlying

regression model. The tests of the validity of the hypothesis are based on

constructing test statistics, with known exact or asymptotic distributions, to

evaluate the statistical significance of changes in SSE, R2, or l .

Consider the model

y = X β + ε

and a hypothesis, Ho: g(β) = 0 which imposes individual and/or multiple

constraints on the β vector.

The Chow and likelihood ratio tests for testing Ho: g(β) = 0 can be

constructed from the output obtained from estimating the two following

regression models.

(1) Estimate the regression model y = Xβ + ε without imposing any

constraints on the vector β. Let the associated sum of square errors,

coefficient of determination, log-likelihood value and degrees of freedom

19II'

be denoted by SSE, R2, l , and (n - k).

(2) Estimate the same regression model where the β is constrained as

specified by the hypothesis (Ho: g(β) = 0) in the estimation process. Let

the associated sum of squared errors, R2, log-likelihood value and degrees

of freedom be denoted by SSE*, R2*, l * and (n - k)*, respectively.

b. Chow test

The Chow test is defined by the following statistic:

where r = (n-k) - (n-k)* is the number of independent restrictions imposed on β by

the hypothesis. For example, if the hypothesis was Ho: β2 + 6 β5 =4, β3 = β7 = 0,

then the numerator degrees of freedom (r) is equal to 3. In applications where the

SST is unaltered by the imposing the restrictions, we can divide the numerator and

denominator by SST to yield the Chow test rewritten in terms of the change in the

R2 between the constrained and unconstrained regressions.

Note that if the hypothesis (H0: g(β) = 0) is valid, then we would expect R2 (SSE)

and R2* (SSE*) to not be significantly different from each other. Thus, it is only

large values (greater than the critical value) of F which provide the basis for

rejecting the hypothesis. Again, the 2R form of the Chow test is only valid if the

dependent variable is the same in the constrained and unconstrained regression.

References:

(1) Chow, G. C., "Tests of Equality Between Subsets of Coefficients in Two

Linear Regressions," Econometrica, 28(1960), 591-605.

(2) Fisher, F. M., "Tests of Equality Between Sets of Coefficients in Two Linear

Regressions: An Expository NOTE," Econometrica, 38(1970), 361-66.

SSE* - SSE

rSSE ~ F(r, n - k)n - k

2 2

2

- * n - kR RF = ~ F(r, n - k)

1 - rR

20II'

c. Likelihood ratio (LR) test.

The LR test is a common method of statistical inference in classical

statistics. The motivation behind the LR test is similar to that of the Chow test

except that it is based on determining whether there has been a significant

reduction in the value of the log-likelihood value as a result of imposing the

hypothesized constraints on β in the estimation process. The LR test statistic is

defined to be twice the difference between the values of the constrained and

unconstrained log-likelihood values (2( l - l *)) and, under fairly general

regularity conditions, is asymptotically distributed as a chi-square with degrees of

freedom equal to the number of independent restrictions (r) imposed by the

hypothesis. This may be summarized as follows:

The LR test is more general than the Chow test and for the case of

independent and identically distributed normal errors, with known σ2, LR is equal

to LR = [SSE* - SSE]/σ2 .

Recall that s2 = SSE/(n - k) appears in the denominator of the Chow test statistic

and that for large values of (n-k), s2 is "close" to σ2; hence, we can see the

similarity of the LR and Chow tests. If σ2 is unknown, substituting the

concentrated log-likelihood function into LR yields

LR = 2 ( l - l *)

= n [ln (SSE*) - ln (SSE) ]

= n [ln (SSE* / SSE)].

2aLR = 2( - *) (r). χl l %

21II'

a LR = nln[1/(1-R2)] = -nln[1-R2] ~ χ2(k-1).

If the hypothesis Ho: β2 = β3 = . . . βk = 0 is being tested in the classical

normal linear regression model, then SSE* = SST and LR can be rewritten in

terms of the R2 as follows:

In this case, the Chow test is identical to the F test for overall explanatory power

discussed earlier.

Thus the Chow test and LR test are similar in structure and purpose. The

LR test is more general than the Chow test; however, its distribution is

asymptotically (not exact) chi-square even for non-normally distributed errors.

The LR test provides a unified method of testing hypotheses.

d. Applications of the Chow and LR tests:

(1) Model: yt = β1 + β2xt2 + β3xt3 + β4xt4 + εt

Ho: β2 = β3 = 0 (two independent constraints)

(a) Estimate yt = β1 + β2xt2 + β3xt3 + β4xt4 + εt

to obtain SSE = Σet2 = (n - 4)s2, R2 ,

l =

Π

n

SSEln + )ln(2 + 1

2

n- ,

n-k = n - 4

(b) Estimate yt = β1 + β4xt4 + εt to obtain

SSE* = Σet*2 = (n - 2)s*2

SSE*, R2*, l * and (n-k)* = n - 2

22II'

(c) Construct the test statistics

SSE* - SSE SSE* - SSEn- 4 SSE*-SSE(n k)* (n k) 2Chow = = =

SSE SSE 2 SSE

n k n 4

− − −

− −

2 2

2

- * n - 4R R= ~ F(2, n - 4)

1 - 2R

a LR = 2( l - l *) ~ χ2(2).

(2) Tests of equality of the regression coefficients in two different regressions

models.

(a) Consider the two regression models

y(1) = X(1) β(1) + ε(1) n1 observations, k independent variables

y(2) = X(2) β(2) + ε(2) n2 observations, k independent variables

Ho: β(1) = β(2) (k independent restrictions)

(b) Rewrite the model as

(1)'

(1) (1)(1) (1)

(2)(2) (2) (2)

0 y Xy = = +

0 y X

β ε β ε

Estimate (1)' using least squares and determine SSE, R2, l

and (n - k) = n1 + n2 - 2k.

Now impose the hypothesis that β(1) = β(2) = β and write (1)

as

(2)’

(1) (1) (1)

(2) (2) (2)

y Xy = = β +

y X

ε ε

Estimate (2)’ using least squares to obtain the constrained

sum of squared errors (SSE*), R2*, l * and

23II'

(n - k)* = n1 + n2 - k.

(c) Construct the test statistics

SSE* - SSE

(n - k)* - (n - k)Chow =

SSE

(n k)−

2 2

1 2, 1 22

- * + - kR R n n = ~ F ( + - 2k)k n n

1 - kR

a LR = 2( l - l *) ~ χ2 (k).

5. Testing Hypotheses using Stata a. Stata reports the log likelihood values when the command

estat ic

follows a regression command and can be used in constructing LR tests.

b. Stata can also perform many tests based on t or Chow-type tests.

Consider the model

(1) Yt = β1 + β2Xt2 + β3Xt3 + β4Xt4 + εt

with the hypotheses:

(2) H1: β2 = 1

H2: β3 = 0

H3: β3 + β4 = 1

H4: β3β4 = 1

H5: β2 = 1 and β3 = 0

The Stata commands to perform tests of these hypotheses follow OLS

estimation of the unconstrained model.

24II'

reg Y X2 X3 X4

estimates the unconstrained model

test X2 = 1 (Tests H1)

test X3 = 0 (Tests H2)

test X3 + X4 = 1 (Tests H3)

testnl _b[X3]*_b[X4] = 1 (Tests H4. The “testnl” command is

for testing nonlinear hypotheses. The suffix “_b”, along with the braces, must be used when testing nonlinear hypotheses)

test (X2 = 1) (X3 = 0) (Tests H5)

95% confidence intervals on coefficient estimates are automatically calculated in

Stata. To change the confidence level, use the “level” option as follows:

reg Y X2 X3 X4, level(90) (changes the confidence level

to 90%)

25II'

F. Stepwise Regression

Stepwise regression is a method for determining which variables might be

considered as being included in a regression model. It is a purely mechanical approach,

adding or removing variables in the model solely determined by their statistical

significance and not according to any theoretical reason. While stepwise regression can be

considered when deciding among many variables to include in a model, theoretical

considerations should be the primary factor for such a decision.

A stepwise regression may use forward selection or backward selection. Using

forward selection, a stepwise regression will add one independent variable at a time to see

if it is significant. If the variable is significant, it is kept in the model and another variable

is added. If the variable is not significant, or if a previously added variable becomes

insignificant, it is not included in the model. This process continues until no additional

variables are significant.

Stepwise regression using Stata

To perform a stepwise regression in Stata, use the following commands:

Forward:

stepwise, pe(#): reg dep_var indep_vars

stepwise, pe(#) lockin1: reg dep_var (forced in

variables) other indep_vars

Backward:

stepwise, pr(#): reg dep_var indep_vars

26II'

stepwise, pr(#) lockin1: reg dep_var (forced in

variables) other indep_vars

where the “#” in “pr(#)” is the significance level at which variables are removed, as

0.051, and the “#” in “pe(#)” is the significance level at which variables are entered or

added to the model. If pr(#1) and pr(#2) are both included in a stepwise regression

command, #1 must be greater than #2. Also, “depvar” represents the dependent variable,

“forced_indepvars” represent the independent variables which the user wishes to remain

in the model no matter what their significance level may be, and “other_indepvars”

represents the other independent variables which the stepwise regression will consider

including or excluding. Forward and backward stepwise regression may yield different

results.

G. Forecasting

Let yt = F(Xt, β) + εt

denote the stochastic relationship between the variable yt and the vector of variables Xt

where Xt = (xt1,..., xtk). β represents a vector of unknown parameters.

Forecasts are generally made by estimating the vector of parameters ˆβ(β) ,

determining the appropriate vector )X(X tt and then evaluating

ttˆˆˆ = F( , β) .y X

The forecast error is FE = yt - yt.

There are at least four factors which contribute to forecast error.

27II'

1. Incorrect functional form (This is an example of specification error and will be

discussed later.)

2. Existence of random disturbance (εt)

Even if the "appropriate" future value of Xt and true parameter values, β,

were known with certainty

FE = yt - yt = yt - F(Xt,β) = εt

2

FEσ = Variance(FE)

= Var(εt) = σ2.

In this case confidence intervals for yt would be obtained from

t t( / 2) ( / 2)tPr [F ( , β) - σ < < F ( , β) + σ] = 1 - αyt tX Xα α

which could be visualized as follows for the linear case:

Yt

X

Yt

Xt

28II'

3. Uncertainty about β

Assume F(Xt, β) = Xtβ in the model yt = F(Xt, β) + εt, then the predicted

value of yt for a given value of Xt is given by

ttˆˆ = β ,y X

and the variance of ˆt

y (sample regression line), t

2

yσ is given by

t

2t ty

ˆ = Var (β) X Xσ ′ ,

with the variance of the forecast error (actual y) given by:

2

FEσ

t

2 2y

= + .σ σ

Note that 2

FEσ takes account of the uncertainty associated with the unknown

regression line and the error term and can be used to construct confidence

intervals for the actual value of Y rather than just the regression line.

Unbiased sample estimators of t

2

yσ and 2FEσ can be easily obtained by replacing σ2

with its unbiased estimator s2.

Confidence intervals for t tE ( | ) ,Y X the population regression line:

ttˆˆt t t(α/2) (α/2) yy

ˆ ˆPr [ β - < < β + ] = 1 - αt s t sX Y X

Confidence intervals for Yt:

t t t(α/2) FE (α/2) FEˆ ˆP R [ β - < < β + ] = 1 - αt s t sX Y X

Y t

X t

29II'

4. A comparison of confidence intervals.

Some students have found the following table facilitates their understanding of the different confidence intervals for the

population regression line and actual value of Y. The column for the estimated coefficients is only included to compare

the organizational parallels between the different confidence intervals.

Statistic ( )1ˆ ' 'X X X Yβ

−= ˆˆ

t tY X β= = sample regression line =

predicted Y values corresponding tot

X .

FE (forecast error)

ˆˆt t t t

FE Y Y Y X β= − = −

Distribution

( )12, 'N X Xβ σ

−

( )12 2 '

ˆ, ( ' )t

t t tYN X X X X Xβ σ σ

− =

2 2 2ˆ0,t

FE YN σ σ σ = +

t-stat / 2 / 2

ˆ

ˆ1 Pr

i

i it ts

α α

β

β βα

− − = − < <

= ˆ ˆ

2 2

ˆ ˆPri i

i i it s t sα αβ ββ β β

− < < +

/ 2 / 2

ˆ

ˆ1 Pr

t

t t

Y

X Xt t

sα α

β βα

− − = − < <

ˆ ˆ

2 2

ˆ ˆPr t t tY YX t s X X t sα αβ β β

− < < +

/ 2 / 2

01 Pr

FE

FEt t

sα αα

−− = − < <

=

2 2

Pr 0FE FEFE t s FE t sα α

− < < +

=

2 2

ˆ ˆPr t FE t t FEX t s Y X t sα αβ β

− < < +

C.I. i

β : ˆ ˆ

2 2

ˆ ˆ, i i

i it s t sα αβ ββ β

− +

t

X β : ˆ ˆ

2 2

ˆ ˆ, Xt tY YX t s t sα αβ β

− +

:t

Y2 2

ˆ ˆ, Xt FE t FEX t s t sα αβ β

− +

where Y

s is used to compute confidence intervals for the regression line ( ( )t tE Y X β= ) and FE

s is used in the calculation of

confidence intervals for the actual value of Y. Recall that 2 2 2ˆ sFE Y

s s= + ; hence, 2 2ˆ > FE Y

s s and the confidence intervals for

Y are larger than for the population regression line.

30II'

5. Uncertainty about X. In many situations the value of the independent variable also

needs to be predicted along with the value of y. Not surprisingly, a “poor” estimate of

Xt will likely result in a poor forecast for y. This can be represented graphically as

follows:

6. Hold out samples and a predictive test.

One way to explore the predictive ability of a model is to estimate the model on a

subset of the data and then use the estimated model to predict known outcomes which

are not used in the initial estimation.

7. Example M6 + G2.5 + 10 = y ttt

ttt 2 3ˆ ˆ ˆ= + + G Mβ β β

where yt, Gt, Mt denote GDP, government expenditure, and money supply.

Assume that

Yt

X

Yt

X

X t

31II'

. 10 = s ,10

1532

3205

2510

= )XX( s23-1-2

′

a. Calculate an estimate of GPD(y) which corresponds to

Gt = 100, Mt = 200, i.e., Xt = (1, 100, 200).

tt

10

ˆˆ = β = (1, 100, 200) 2.5y X

6

1460. = 1200 + 250 + 10 =

b. Evaluate s2

ytand s

2FE

corresponding to the Xt in question (a).

10.

200

100

1

1532

3205

2510

200) 100, (1, = X ))XX( s( X = s 3-t

1-2ty

2

t

′ ′

921.81 =

30.30 = syt

931.81 = 921.81 + 10 = s + s = s y2

FE

2

t

30.53 = SFE

7. Forecasting—basic Stata commands

a) The data file should include values for the explanatory variables

corresponding to the desired forecast period, say in observations n1 + 1 to n2.

b) Estimate the model using least squares

reg Y X1 . . . XK, [options]

c) Use the predict command, picking the name you want for the predictions, in

32II'

this case, yhat, e, ˆ, and FE Y

s s .

predict yhat, xb ← this option predicts Y

predict e, resid ← this option predicts the residuals (e)

predict sfe, stdf ← this option predicts the standard

error of the forecast ( FEs )

predict syhat, stdp ← this option predicts the standard

error of the prediction (Y

s )

list y yhat sfe ← this option lists indicated variables

These commands result in the calculation and reporting of s e, ,Y Y, FE and

Ys for observations 1 through n2. The predictions will show up in the Data

Editor of STATA under the variable names you picked (in this case, yhat, e, sfe and syhat). You may want to restrict the calculations to t= n1 + 1, .. , n2 by using

predict yhat if(_n> n1), xb

where “n1” is the numerical value of n1.

d) The variance of the predicted value can be calculated as follows:

s - s = s 2FE

2y

2

t

33II'

H. PROBLEM SETS: MULTIVARIATE REGRESSION

Problem Set 3.1

Theory

OBJECTIVE: The objective of problems 1 & 2 is to demonstrate that the matrix equations and summation equations for the estimators and variances of the estimators are equivalent.

Remember 1

n

t

t

X NX=

=∑ and Don’t get discouraged!!

1. BACKGROUND: Consider the model (1) Yt = β1 + β2 Xt+ εt (t = 1, . . ., N) or equivalently,

(1)’

1 1 1

2 2 21

2

n n n

1 εY X

1 εY X = +

1 εY X

β

β

M M M M

(1)” Y = Xβ + ε

The least squares estimator of YX)XX( = ˆ is ˆ

ˆ1-

2

1′′β

β

β.

If (A.1) - (A.5) (see class notes) are satisfied, then

βββ

ββββ

)ˆVar()ˆ ,ˆCov(

)ˆ ,ˆCov()ˆVar( = )ˆVar(

212

211

)XX( =-12 ′σ

QUESTIONS: Verify the following: *Hint: It might be helpful to work backwards on part c and e.

a.

Σ′

XXN

XNN = XX

t

2 and

1

' N

t t

t

NY

X YX Y

=

= ∑

b. )XN - X( / )Y XN - YX( = ˆ 2

t

2tt2

ΣΣβ

34II'

c. Xˆ - Y = ˆ21 ββ

d. )XN - X( / = )ˆVar(2

t

22

2Σσβ

e.

Σσβ

XN - X

X + n

1 = )ˆVar(

2

t

2

2

2

1

)ˆVar( X + )YVar( =2

2β

f. )ˆVar( X- = )ˆ ,ˆCov(221 βββ

(JM II’-A, JM Stats)

2. Consider the model: εβ ttt + X = Y

a. Show that this model is equivalent to Y = Xβ + ε

where

1 1 1

2 2 2

n n n

εY X

εY XY ,X = ,ε

εY X

= =

M M M

b. Using the matrices in 2(a), evaluate YX)XX(-1 ′′ and compare your answer with

the results obtained in question 4 in Problem Set 1.1.

c. Using the matrices in 2(a) evaluate )XX(-12 ′σ .

(JM II’-A)

Applied

3. Use the data in HPRICE1.RAW to estimate the model

price = β0 + β1sqrft + β2bdrms + u

where price is the house price measured in thousands of dollars, sqrft is the floorspace measured in square feet, and bdrms is the number of bedrooms.

a. Write out the results in equation form. b. What is the estimated increase in price for a house with one more bedroom, holding

square footage constant?

35II'

c. What is the estimated increase in price for a house with an additional bedroom that is 140 square feet in size? Compare this to your answer in part (ii).

d. What percentage variation in price is explained by square footage and number of bedrooms?

e. The first house in the sample has sqrft = 2,438 and bdrms = 4. Find the predicted selling price for this house from the OLS regression line.

f. The actual selling price of the first house in the sample was $300,000 (so price = 300). Find the residual for this house. Does it suggest that the buyer underpaid or overpaid for the house?

36II'

Problem Set 3.2

Theory

1. R2, Adjusted R2( 2R ), F Statistic, and LR

The R2 (coefficient of determination) is defined by

SST

SSE - 1 =

SST

SSR = R

2

where e = SSE t

2Σ and )Y - Y( = SSR , )Y - Y( = SST2

t

2t ΣΣ .

Given that SST = SSR + SSE when using OLS,

a. Demonstrate that 0 ≤ R2 ≤ 1.

b. Demonstrate that n = k implies R2 = 1. (Hint: n=k implies that X is square. Be

careful! Show .) ˆX = Y = Y β

c. If an additional independent variable is included in the regression equation, will

the R2 increase, decrease, or remain unaltered? (Hint: What is the effect upon

SST, SSE?)

d. The adjusted , R , R22 is defined by .

1)SST/(n-

k)SSE/(n- - 1 = R

2 Demonstrate that

, 1 R R kn-

k-1 22 ≤≤≤ i.e., the adjusted R2 can be negative.

))R-(1 kn-

1n- =

kn-

1n-

SST

SSE = R - 1 :(Hint 22

e. Verify that

σ2

SSE - SSE* = LR if σ2 is known

/SSE)ln(SSE*n = if σ2 is unknown where SSE* denotes the

restricted SSE.

37II'

f. For the hypothesis H0: β2 = . . . = βk = 0, verify that the corresponding LR statistic

can be written as )R-ln(1n - = R-1

1ln n = LR 2

2

.

FYI: The corresponding LM test statistic for this hypothesis can be written in

terms of the coefficient of variation as 2LM NR= .

(JM II-B)

2. Demonstrate that

a. X’e = 0 is equivalent to the normal equations . YX = ˆXX ′β′

b. X’e = 0 implies that the sum of estimated error terms will equal zero if regression

equation includes an intercept.

Remember: ˆˆe Y Y Y X β= − = −

(JM II-B)

Applied

3. The following model can be used to study whether campaign expenditures affect election

outcomes:

voteA = β0 + β1ln(expendA) + β2 ln(expendB) + β3 prtystrA + u

where voteA is the percent of the vote received by Candidate A, expendA and expendB are

campaign expenditures by Candidates A and B, and prtystrA is a measure of party

strength for Candidate A (the percent of the most recent presidential vote that went to A's

party).

i) What is the interpretation of β1?

ii) In terms of the parameters, state the null hypothesis that a 1% increase in A's

expenditures is offset by a 1% increase in B's expenditures.

iii) Estimate the model above using the data in VOTE1.RAW and report the results in

the usual form. Do A's expenditures affect the outcome? What about B's

expenditures? Can you use these results to test the hypothesis in part (ii)?

iv) Estimate a model that directly gives the t statistic for testing the hypothesis in part

(ii). What do you conclude? (Use a two sided alternative.). A possible approach,

test 0 1 2:H Dβ β+ = , plug 2D β− for 1β and simplify.

(Wooldridge C. 4.1)

38II'

4. Consider the data

t Output (Yt) Labor (Lt) Capital (Kt)

1 40.26 64.63 133.14

2 40.84 66.30 139.24

3 42.83 65.27 141.64

4 43.89 67.32 148.77

5 46.10 67.20 151.02

6 44.45 65.18 143.38

7 43.87 65.57 148.19

8 49.99 71.42 167.12

9 52.64 77.52 171.33

10 57.93 79.46 176.41

The Cobb Douglas Production function is defined by

(1) 3 41 2β β+ tβ β

t t t t = εeY K L

where (β2t) takes account of changes in output for any reason other than a change in Lt or

Kt; εt denotes a random disturbance having the property that lnεt is distributed N(0, σ2).

Labor’s share

receipts sales total

receipts wagetotalis given by β3 if β3 + β4 (the returns to scale) is

equal to one. β2 is frequently referred to as the rate of technological change

. K and L fixedfor Y/dt

dYt

t

Taking the natural logarithm of equation(1),we obtain

(2) t t t1 2 3 t 4ln = + t + ) + ln( ) + ln(ε ) .β β β ln(L βY K

If ββ 43 + is equal to 1, then equation (2) can be rewritten as

(3) t t t t1 2 3ln( / ) = + t + ln( / ) + ln .Y K L K t

εβ β β

a. Estimate equation (2) using the technique of least squares.

b. Corresponding to equation (2)

1) Test the hypothesis Ho: β2 = β3 = β4 = 0. Explain the implications of this

hypothesis. (95% confidence level)

2) perform and interpret individual tests of significance of β2, β3, and β4, i.e. test

39II'

Ho : βi = 0 .α = .05.

3) test the hypothesis of constant returns to scale, i.e., Ho: β3 + β4 = 1, using

a. a t-test for general linear hypothesis, let restrictions δ= (0,0,1,1);

b. a Chow test;

c. a LR test.

c. Estimate equation (3) and test the hypothesis that labor’s share is equal to .75, i.e., β3 =

.75.

d. Re-estimate the model (equation 2) with the first nine observations and check to see if the actual

log(output) for the 10th observation lies in the 95% forecast confidence interval.

(JM II)

5. The translog production function corresponding to the previous problem is given by

2 2

1 2 3 4 5 6 7ln(Y) = + t + ln(L) + ln(K) + (ln(L) + (ln(K) + (ln(L)) ln(K) + ln(ε )β β β β β ) β ) β

t

a. What restrictions on the translog production function result in a Cobb-Douglas

production function?

b. Estimate the translog production function using the data in problem 5 and use the Chow and

LR tests to determine whether it provides a statistically significant improved fit to the data,

relative to the Cobb-Douglas function.

(JM II)

6. The transcendental production function corresponding to the data in problem 5 is defined by

1 2 3 4 5 6 + t + L + Kβ β β β β βY = e L K

a. What restrictions on the transcendental production function result in a Cobb-Douglas

production function?

b. Estimate the transcendental production function using the data in problem 2 and use the Chow

and LR tests to compare it with the Cobb-Douglas production function.

(JM II)

40II'

APPENDIX A

Some important derivatives:

Let

aa

aa =A ,

a

a = a ,

x

x = X

2221

1211

2

1

2

1

(symmetric) )a = a = a( 2112

1. a = dX

a)X( d =

dX

X)a( d ′′

2. AX2 = dX

AX)X( d ′

Proof of a = dX

a)X( d ′

Note: a’X = X’a = a1x1 + a2x2

a = a

a =

X/a)X(

X/a)X( =

dX

a)X( d

2

1

2

1

∂′∂

∂′∂′

Proof of d (X AX)

= 2AXdX

′

Note: X’AX = a11x1

2 + (a12 + a21) x1x2 + a22 x22

∂′∂

∂′∂′

xa2 + xa2

xa2 + xa2 =

X/AX)X(

X/a)X( =

dX

AX)X( d

2221

2111

2

1

xa + xa

xa + xa 2 =

2221

2111

x

x

aa

aa 2 =

2

1

22

11

.AX2 =

41II'

APPENDIX B

An unbiased estimator of σ2 is given by

. k)SSE/(n- = y) )X)XX( X - (Iy( kn-

1 = s

1-2 ′′′

Proof: To show this, we need some results on traces:

a = (A)tr ii

n

iΣ

1) tr(I) = n

2) If A is idempotent, tr(A) = rank of A

3) tr(A+B) = tr(A) + tr(B)

4) tr(AB) = tr(BA) if both AB and BA are defined

5) tr(ABC) = tr(CAB)

6) tr(kA) = k tr(A)

Now, remember that

2 1 = e eσ

n′

and ee k -n

1 = s

2 ′

-1ˆe = y - Xβ = y - X ( X X X y = My)′ ′

= M (Xβ + ε) = MXβ + Mε ,

= M ε ,

where M = I - X(X’X)-1X’.

Note that M is symmetric, and idempotent (problem set R.2).

So 2 1 1 = e e = εM Mεσ

n n′ ′ ′

42II'

1= εMMε .

n′

1= εMε .

n′

and 2 1 = εMε .s

n - k′

2 1 1E ( ) = E (εMε) = E (tr(εMε))σ

n n′ ′ because i jcov ( , ) = 0, i j)ε ε ≠

1 1= Etr (M εε ) = tr (ME (εε ))

n n′ ′

2 21 1= tr (M I) = tr ( M)σ σ

n n

2σ

= tr(M)n

2

-1σ= tr(I - X(X X X ))

n′ ′

2-1σ

= (n - tr (X(X X X )))n

′ ′

2

-1σ= (n - tr (X X(X X )))

n′ ′

2

kσ

= (n - trace ( ))In

2σ

= (n - k)n

2 2 2 2n - k n= so E ( ) = E ( ) = .ˆσ s σ σ

n n - k

Therefore 2σ is biased, but 2 2 2n

E ( ) = E ( ) = ˆs σ σn - k

and s2 is unbiased.

43II'

APPENDIX C

β = AY = (X X) X Y′ ′ ′% is BLUE.

Proof: Let ii = Yβ A% where Ai denotes the ith row of the matrix A. Since the result will be

symmetric for each βi (hence, for each Ai), denote Ai by a’ where a is a (n by 1) vector.

The problem then becomes:

Min a’Ia when I is nxn

s.t. AX = I when X is nxk (for unbiasedness)

or min a’Ia

s.t. X’a = i where i is the ith column of the identity matrix.

Let = a Ia + λ (X a - i)′ ′ ′l which is the associated Lagrangian function where λ is kx1.

The necessary conditions for a solution are:

= 2a I + λ X = 0a

∂′ ′ ′

′∂

l

= (X a - i) = 0 .λ

∂′

′∂

l

This implies

a = (-1/ 2)λ X ) .′ ′ ′

Now substitute a = (-½)Xλ into the expression for = 0λ

∂

′∂

land we obtain

(-1/ 2) X X λ = i′ -1

λ = - 2 (X X i)′

X)XX( i(-2) 2)/(-1 = a-1 ′′′′

. A = X)XX(i = i-1 ′′′

which implies

X)XX( =A -1 ′′

hence, -1

β = (X X X y .)′ ′%

III A 1

James B. McDonald

Brigham Young University 2/9/2010 IV. Miscellaneous Topics

A. Multicollinearity

1. Introduction

The least squares estimator of β in the model

y = Xβ + ε

is defined by

β = (X'X)-1X'y.

As long as the columns of the X matrix are independent, (X'X)-1 exists and β can

be evaluated. If any one column of X can be expressed as a linear combination of the

remaining columns, X'X = 0 and (X'X)-1 is not defined.

Consider the matrix

k

1 1 1 2 1 k

2 1 2 2 2 k

k 1 k 2 kX

Cor( , ) Cor( , ) ... Cor( )X X X X X X

Cor( , ) Cor( , ) ... Cor( )X X X X X XCor(X) =

Cor( , ) Cor( , ) Cor( )X X X X X

M M M

L

12 1k

21 2k

k1 k 2

1 ...

1 ...=

1

ρ ρ

ρ ρ

ρ ρ

M M O M

L

where ρij

= correlation (Xi,Xj). Recall that 0 ≤ Cor(X) ≤ 1.

One "polar" case is that in which the "independent" or exogenous variables are

orthogonal or uncorrelated with each other, i.e., Cor(X) = I; hence, Cor(X) = 1.

III A 2

Another polar case is the situation in which one exogenous variable can be written as a

linear combination of the remaining exogenous variables, e.g.,

xt2 xt3

Sales Revenuet = β1 + β2 (Sales of right ski boots) + β3 (Sales of left ski boots) + εt.

In this case,

2 3

3 2

1 Cor( , ) 1 1X XCor(X) =

Cor( , ) 1 1 1X X

=

and Cor(X) = 0.

While the extreme case of Cor(X) = 0 is not particularly common, frequent instances in

which Cor(X) is small may arise in which some rather "strange" results may occur. We

will define multicollinearity to exist whenever Cor(X) < 1. Cor(X) = 0 is referred to

as exact multicollinearity. Multicollinearity is not necessarily bad, but it may make it

difficult to accurately estimate the impact of individual variables on the expected value of

the dependent variable. The question of interest is generally not whether we have

multicollinearity, but what is the "degree" of multicollinearity, what are the associated

consequences, and what can be done about it? While multicollinearity can contribute to

imprecise estimates, it is not the only cause or explanation of imprecise estimation. In

summary, the impact of multicollinearity is that if two or more independent variables move

together, then it can be difficult to obtain precise estimates of the effects of the individual

variables, βi = ∂Ε(yt)/∂Xti.

III A 3

2. A special case of two explanatory variables.

In order to illustrate some of the consequences of multicollinearity, consider the

following model:

(1) yt = β1 + β2xt2 + β3xt3 + εt t = 1,2, . . ., n.

Summing (1) over t and dividing by n we obtain

(2) ty = β1 + β2 x 2 + β3 x 3 + ε

where y , x 2, x 3, and ε , respectively, denote the sample means of yt, xt2, xt3, and εt.

Subtracting (2) from (1) yields

(3) yt = β2xt2 + β3xt3 + tε%

where yt = yt - y , xt2 = xt2 - x 2, xt3 = xt3 - x 3, and tε% = εt - ε .

The least squares estimators of β2 and β3 are given by (Appendix A.1)

(4) 2 -1

3

ˆ = (X X X y)

ˆ

β′ ′

β

% % %

where 2 y22 23

3y32 33

mm mX X = , X y =

mm m

′ ′

% % %

)x - y)(x - x( = x~x~ = m jtiti

n

1=t

tjti

n

1=t

ij ∑∑

n n

iy ti ti it t

t=1 t=1

= = ( - )( - y)y ym x x x∑ ∑%%

and

(5) 2 -12

3

ˆVar = (X X .)

ˆ

β′ σ β

% %

From equation (5) it can be shown that

(6) i

22ˆ 2

i 23

= Var( )(1- )Xn

β

σσ

ρ

(7) s

- ˆ = t

ˆ

iiˆ

i

i

β

β

ββ

III A 4

where

22t 2 t32 2 3t 2 t3 2

2 323 22 2 2t2 t3 t 2 2 t3 3

( - )( - )( ) x xx xx x = = Correlation (X ,X ).

x x ( - ( - ))x x x x

∑∑=ρ

∑ ∑ ∑ ∑

% %

% %

The confidence intervals for βi are given by

(8) .) - )(1xVar(n

st ˆ = st ˆ

2

23ti

22/1

2/iˆ2/i i

ρ±β±β αβα

Equation (6) can be used to illustrate the point made on page 3 about multicollinearity

only being one of several factors which may impact estimator precision. From (6) we note

that (other things being equal) increasing the sample size (n), increasing the variance of the

variable whose coefficient is being estimated (Xi), reducing σ2, or reducing the square of the

correlation between the independent variables will increase the precision of our estimators,

i.e., reduce the variance of the estimator. A graphical analysis may be helpful.

In order to focus on the effect of multicollinearity on the variance of say β 2, consider

the ratioσ β2

2

~ with multicollinearity (ρ23

≠ 0) to σβ2ˆ

2without multicollinearity (ρ23 = 0). In

other words, for different values of ρ2223, we calculate this ratio, which reflects how many

times worse (greater) the variance is of an estimator subject to multicollinearity compared to

one without. This ratio is equal to 1/(1-ρ2223).

ρ 2

23 2

2

2

2ˆ

β

β

σ

σ%

0

1

1/2

2

2/3

3

9/10

10

99/100

100

Note again that other things being equal, the larger the correlation between the two

independent variables in equation (1), the larger the variance of β 2 and the less "precise" will be

III A 5

the estimator. The effect can be substantial. However, it is important to recall that multicollinearity is not the only factor having an impact on estimator precision as measured

by σβ2ˆ

2, see equation (6).

The following figure of the density of β 2 for different values of ρ23

(and hence σβ2ˆ

2) will be

useful in our discussion of the possible impact of multicollinearity.

Density of ββββ 2

Recall that (i) the points of inflection on the normal density curve occur at µ ± σ so that

if we are testing the hypothesis Ho: β2 = 1

(ii) 2 2

ˆ ˆ2

ˆPr(- < - 1 ) 0.68β β≤ =σ σβ

(iii) 2 2

ˆ ˆ2

ˆPr(-2 < - 1 < 2 ) 0 .95β β =σ σβ

(iv)

σσ

ββ

ββ ˆˆ

2

2

22

1- <

1 - ˆPr = 0) < ˆPr(

σρ

σ

β

β

/m - 1- < 1-ˆ

Pr = 222

23ˆ

2

2

From (iv) we can evaluate the probability of β 2 assuming the "wrong sign" for the case in which

β2 = 1 for given m22 and σ. In the previous figure these probabilities are shown as the area to

the left of the vertical dotted line. If σ = m22 (strictly for purposes of exposition), the

probability of an "incorrect" sign would be given in the following table.

ˆ 0.5=2

2

βσ

ˆ 1.0=2

2

βσ

ˆ 1.5=2

2

βσ

III A 6

23ρ

Probability of an incorrect sign

0

.16

1/2

.24

2/3

.28

9/10

.37

99/100

.46

Based on our previous discussion we note that increases in and "severe" multicollinearity

can be associated with the following situations.

(1) The precision of estimation is reduced (Var( β i) increases) so that it becomes difficult to

accurately estimate individual effects of variables which move together.

(2) It was noted that the probability of obtaining estimates having the "wrong" sign increases as Corr2(x2,x3) increases.

(3) Note from (7) that as ρ23 → 1, the t-statistics get smaller: hence, based upon a strict adherence to a "t-criterion" for deleting variables, a variable may be deleted from an equation when that variable does have an effect. This is always a possibility in statistical inference, but with severe multicollinearity the confidence intervals can become so wide (see equation (8)) as to make it difficult to reject "almost any hypothesis." Recall that confidence intervals for βi are given by

) - )(1xVar(n

s t ˆ

2

23ti

2

ci ρ±β

for the case in which k = 3.

(4) Severe multicollinearity is frequently associated with "significant" F statistics and

"insignificant" t statistics for a group of variables which are expected to be important. The collective importance of a group of variables can be checked using a Chow test.

Huge F-statistics but small t-statistics? Likely diagnosis: multicollinearity

III A 7

To visualize this situation consider the joint confidence intervals for β2 and β3 which might appear as

Note that the individual confidence intervals for β2 and β3 include 0; hence, we would not be able to reject the hypothesis that β2 or β3 = 0. The joint confidence interval for β2 and β3 does not include the origin; hence, the F statistic will be statistically significant. It is the high correlation between x2 and x3 that contributes to the elliptical shape of the joint confidence interval.

(5) Coefficient estimates may be extremely sensitive to the addition of more data.

(6) Corr(X) = 23 2

23

23

11

1

ρρ

ρ= − may be close to zero.

(7) Various pairwise correlations between the X's may be close to 1.

(8) Condition index (CI).

High pairwise correlations between explanatory variables are sufficient for multicollinearity problems, but are not necessary. Belseley, Kuh and Welsch (BKW) define a condition index

Maximum eigen valueCI =

Minimum eigen value

where the eigen values correspond to the correlation matrix of the x's. BKW use arule of thumb is that multicollinearity is high if CI > 30.

Consider the condition index for the two polar cases in the introduction of this section.

III A 8

10

01 = C1

11

11 = C2

which have respective eigen values

(λ11, λ12) = (1,1) and (λ21, λ22) = (0, 2).

The corresponding condition indices are then

0 = 1

1 = CI1

2

2 = (undefined) so the CI as C 0.CI

0→ ∞ →

We remind the reader that the CI merely provides a rule of thumb.

In problem number 3.1(1), the reader is asked to verify that the condition index

corresponding to the correlation matrix

ρ

ρ

1

1 = C

is given by1 + | |

.1 - | |

ρ

ρ

Note that CI increases as ρ increases and includes C1 and C2 as special cases.

3. Some results for the case of an arbitrary number of independent variables.

Consider the more general model

(9) Yt = β1 + β2Xt2 + β3Xt3 + . . . + βkXtk + εt.

Some of the results obtained in the previous section can be extended to the more general case as follows:

(10a-c) i

22ˆ 22

i i

= (1 - )sn

β

σσ

ρ

i

22ˆ 22

i i

s = S

(1 - )snβ

ρ

i

i

1/ 22

i i ii i iˆ

ˆ

ˆ ˆ ˆ - s (1 - ( - )) = = t

ss

nβ

β

ρ ββ β β

where 2 2

i ti is = (X -X ) /n∑

III A 9

ρ 2

i = Correlation2 (between Xi and all other independent variables)

= R2 obtained from regressing Xi on other independent variables.

These results seem reasonable. In particular, the higher the correlation between an

independent variable and the set of other independent variables, the less precise the

associated coefficient estimator as measured by the variance. Again, we note that

“multicollinearity" is only one factor contributing to poor estimator precision (large

σβ2ˆ

2). Large values of σ2 and small N and small s 2

i have the same impact.

The impact of multicollinearity as measured by pairwise correlations between

independent variables becomes much less clear. In particular, if cij is the correlation

between the ith and jth independent variable, it can be shown that

)c)(c(Ns

- = c

ikii

2i

2

ik

2ˆ

i σ

∂

σ∂ β (11)

where cst denotes the stth element in the inverse of the correlation matrix. Consequently,

the impact of an increase in the pairwise correlation between two variables upon

estimator precision is indeterminant.

Finally, for a given "degree of multicollinearity," individual coefficient estimators

may be statistically significant if the overall fit of the model 2( )R

is good enough. To be more specific

(12)

i

ii/ 2

ˆ

ˆ - > t

sα

β

ββ

if and only if

2

22 i2 iii2 2

2 y

ˆ( - )N > 1 - (1 - )sR

t sα

ββρ

In other words, for any degree of multicollinearity, as measured by 2

iρ , the estimate of βi

will be statistically significant if the adjusted R2 ( 2R ) is large enough to satisfy the inequality in equation (12). This inequality can be easily derived by squaring both sides

of the first inequality, replacing the 2ˆsiβby

2

2ti

s

n Var( )(1 - )x iρ

, noting that

III A 10

22

2

/( )1 1

/( 1) y

SSE n k sR

SST n s

−= − = − −

and manipulating the resulting expression. The second

inequality in (12) can also be rewritten in terms of R2 .

III A 11

4. Some proposed "solutions" to the multicollinearity problem

There have been numerous solutions proposed to circumvent the multicollinearity

problem. However, the basic problem with multicollinearity is that the variables

(exogenous) may be moving so closely together as to make it difficult to obtain accurate

estimates of individual effects and, consequently, each proposed technique has associated

problems. It should be mentioned that even for the case of severe (not perfect)

multicollinearity, least squares estimators are unbiased, minimum variance of all unbiased

estimators, consistent, and are asymptotically efficient as long as (A.1)-(A.5) are satisfied.

Some suggested solutions include:

(1) Obtain more data: If additional data had been available it would probably have been

used initially. One might try combining cross sectional and time series data. Panel

data often includes more variability and less collinearity among the variables.

(2) Principle components: Replace "problem variables" with a fewer number of linear

combinations of the deleted variables which "accounts for most of their explanatory

power (variance)." This approach is associated with interpretational problems as well as

resulting in the possibility of biased estimators.

(3) Delete a variable: The deletion of one of the variables which is "nearly" linearly related

to the other independent variables is a common practice, but may result in biased

estimators if it is an important variable.

(4) Impose constraints on the parameters: This approach is really a generalization of

(3) deleting a variable, i.e., βi = 0. However, there may be theoretical reasons for

imposing constraints on the parameters such as constant returns to scale in a production

function or no money illusion in demand equations. The validity of these constraints

could be investigated using a Chow or likelihood ratio test. Judge has shown that least

squares estimator which takes account of linear constraints is minimum variance among

estimators satisfying the constraint. If the constraint is not true, the estimator will be

biased and have variances equal to unconstrained least squares.

III A 12

(5) Ridge Regression Techniques

A simple ridge regression estimator is given by the following

β (k) = (X'X + kI)-1X'y.

The ridge regression estimator will be biased (bias( β (k)) = -k(X'X + kI)-1β), but the

value of k is often selected to minimize the MSE ( β (k)), say for k*. Note that for k = 0

the ridge estimator is the OLS estimator of β, i.e., β (0) = β . It can be shown that

MSE ( β (k*)) ≤ MSE ( β (0)).

The basis for selected β (k*) is motivated by considering the following figure.

In this case the OLS estimator is unbiased, but has a large variance relative to the biased

ridge estimator. Recall that it can be shown that MSE( β ) = var( β ) + (bias( β ))2.

This figure suggests possible benefits by selecting a slightly biased estimator if there are significant reductions in variance. The MSE is often used to quantify this tradeoff. Ridge estimators are biased and the problem of statistical inference has not been worked out.

ββββ

( )β k *

( )β 0

III A 13

5. PROBLEM SET 4.1

Multicollinearity

Theory 1. Prove that the condition index (C.I.) corresponding to the correlation matrix

1+1

C is C.I. = 1 1-

ρρ =

ρ ρ

Hint: Use the quadratic formula from college algebra.

(JM III-A)

2. Prove and discuss equation (12) in the notes on collinearity. (Hint: this problem basically

involves algebraic manipulation, be patient). Based on the result in equation (12), you can see that statistical significance of individual estimators is retained for an arbitrary degree of multicollinearity if the explanatory power of the model is high enough.

(JM III-A 6)

Applied 3. Consider the following data:

Yt Ct Wt

1883 1749 2.36 1909 1756 2.39 1969 1814 2.47 2015 1867 2.52 2126 1943 2.65 2239 2047 2.81 2335 2127 2.93 2403 2164 3.01 2486 2256 3.12 2534 2315 3.18 2534 2328 3.70

Where Y

t, C

t, and W

t, respectively, denote income, consumption, and wage rates.

a. Estimate

(1) t 1 2 t tC Y= α + α + ε

III A 14

(2) t 1 2 t tC W ′= β + β + ε

(3) t 1 2 t 3 t tC Y W ′′= γ + γ + γ + ε

using the first ten observations. Also, estimate equation (3) for the entire data set (11 observations). Explain the results.

(JM III-A)

4. Refer to problem 4 from "HW 2.2: K-Variate Regression". Test the hypothesis that

β3 = β4 = 0 in equation (2) and reconcile the results with the results obtained based upon individual tests of significance for β3 and β4 using t-statistics.

(JM III-A)

5. Consider the following set of data: Y X

2 X

3

2 1 1 4 2 4 6 3 7 8 4 10 10 5 13 12 6 16 14 7 19 16 8 22 18 9 25 20 10 28

Discuss any problems associated with estimating β1, β2 and β3 in the model

Yt = β

1 + β

2X

t2 + 3β X

t3 + ε

t.

(JM III-A)

6. In a study relating college grade point average (GPA) to time spent in various activities,

you distribute a survey to several students. The students are asked how many hours they

spend each week in four activities: studying, sleeping, working, and leisure. Any

activity is put into one of four categories, so that for each student, the sum of hours in the

four activities must be 168.

a. What problems will you encounter in estimating the model

1 2 3 4 4 tGPA study sleep work leisure= α + α + α + α + α + ε

III A 15

b. How could you reformulate the model so that it’s parameters have a useful

interpretation? (Wooldridge, 3rd edition, problem 3.5)

7. A problem of interest to health officials (and others) is to determine the effects of

smoking during pregnancy on infant health. One measure of infant health is birth

weight: a birth weight that is too low can put an infant at risk for contracting various

illnesses. Since factors other than cigarette smoking that affect birth weight are likely to

be correlated with smoking, we should take those factors into account. For example,

higher income generally results in access to better prenatal care, as well as better

nutrition for the mother. An equation that recognizes this is

bwght = β0 + β1cigs + β2faminc + u

a) What do you think is the most likely sign for β2?

b) Do you think cigs and faminc are likely to be correlated? Explain why the

correlation might be positive or negative.

c) Now estimate the equation with and without faminc, using the data in BWGHT.RAW.

Report the results in equation form, including the sample size and R-squared.

Discuss your results, focusing on whether adding faminc substantially changes the

estimated effect of cigs on bwght. Is the estimated coefficient of β2 statistically

significant?

III A 16

Appendix 1. Derivation of equation (4)

yt = β1 + β2xt2 + β3xt3 + εt

y = β1 + β2 x 2 + β3 x 3 + ε

( )ty - y = β2 (xt2 - x 2) + β3(xt3 - x 3) + εt - ε

yt = β2x2 + β3x3 + ε% t

The X% matrix is given by

x~x~

..

..

..

x~x~

x~x~

x~x~

x~x~

3n2n

4342

3332

2322

1312

and

12 13

22 23

32 33

12 22 n 2 42 43

13 23 n3

n 2 n3

x x

x x

x x

...x x x x x(X X) =

... . .x x x

. .

. .

x x

′

% %

% %

% %

% % % % %% %

% % %

% %

2

t 2 t 3t 2

2t3 t 2 t3

x xx =

x x x

∑∑ ∑ ∑

% %%

% % %

mm

mm =

3332

2322

III A 17

Appendix 2. Derivation of equation (6)

m - mm

mm-

m-m

= mm

mm2233322

2223

2333

3332

2322

1-

m - mm

m = )ˆVar(

2233322

332

2

σβ

m

m - mm =33

2233322

2σ

m

m -m

=

33

223

22

2σ

mm

mm -m

=

3322

22322

22

2σ

)(m - m =

2

232222

2

ρ

σ

) - (1m =

2

2322

2

ρ

σ

) - )(1x~( =

2

2322t

2

ρ∑

σ

Similarly,

m - mm

m = )ˆVar(

2233322

222

3

σβ

) - (1m =

2

2333

2

ρ

σ

) - )(1x~( =

2

2323t

2

ρ∑

σ

I I I B 1

James B. McDonal d Br i gham Young Uni ver s i t y 2/ 18/ 2010

IV. Miscellaneous Topics

B. Binary Variables (Dummy Variables)

Many var i abl es , whi ch we may want t o i ncl ude i n an economet r i c model , may

not be quant i t at i ve ( measur abl e) , but r at her ar e qual i t at i ve i n nat ur e. For

exampl e, an i ndi vi dual wi l l be a homeowner , or wi l l not ; wi l l be mar r i ed or

not . Such char act er i s t i cs may have a bear i ng on an i ndi vi dual ' s behavi or , but

ar e not quant i f i abl e. One way t o i ncl ude t he ef f ect of such char act er i s t i cs

i s t o i nt r oduce bi nar y or dummy var i abl es . For exampl e, l et t he bi nar y

var i abl e Dt i ndi cat e whet her a gi ven i ndi vi dual i s mar r i ed or not by def i ni ng

Dt = 0 i f t he t th i ndi vi dual i s s i ngl e and Dt = 1 i f t he t th i ndi vi dual i s

mar r i ed.

We now cons i der sever al model s whi ch make use of dummy var i abl es , di scuss

t he dummy var i abl e t r ap, i ndi cat e some i nt er es t i ng gener al i zat i ons , and

i nves t i gat e appl i cat i ons of t hese t echni ques t o sever al pr obl ems i n

economi cs .

1. Models with binary explanatory variables

a. An exampl e: t he r el at i onshi p bet ween sal ar y and a col l ege degr ee

Let Yt

= Annual sal ar y of t he t th per son i n t he sampl e,

D1t

= 1 i f t he t th per son i s a col l ege gr aduat e

= 0 ot her wi se,

D2t

= 1 i f t he t th per son i sn' t a col l ege gr aduat e

= 0 ot her wi se.

Not e t hat D2t

= 1 - D1t

Cons i der t he f ol l owi ng t wo model s whi ch can be used t o s t udy t he

i mpact of a col l ege degr ee on annual sal ar y.

Model 1:

Yt

= α1

+ α2

D1t

+ εt

I I I B 2Model 2:

Yt

= β1

D1t

+ β2

D2t

+ εt

.

The coef f i ci ent s i n t he t wo r epr esent at i ons have di f f er ent

i nt er pr et at i ons as summar i zed i n t he f ol l owi ng t abl e.

E( Yt

)

E( Yt

Model 1

α1 + α2

Model 2

β1

E( Yt

α1 β2

I n t he model wi t h one f ewer dummy var i abl es t han cat egor i es

( model 1; cat egor i es = col l ege gr aduat e, not a col l ege gr aduat e)

t he coef f i ci ent of t he bi nar y var i abl e r epr esent s t he expect ed

di f f er ence or di f f er ent i al bet ween t he i ncome l evel s associ at ed

wi t h s t at e of t he i ncl uded dummy var i abl e and t he s t at e ( bench

mar k) associ at ed wi t h t he del et ed dummy var i abl e, i . e. ,

α2

= E( Yt

gr aduat e) - E( Yt

not a col l ege gr aduat e)

The coef f i ci ent s i n t he r epr esent at i on whi ch i ncl udes t he

same number of bi nar y var i abl es as cat egor i es ( model 2) r epr esent

t he expect ed i ncome l evel associ at ed wi t h each cat egor y.

b. Es t i mat i on:

Assume t hat we have a t ot al of n obser vat i ons wi t h t he

f i r s t n1

( n1

+ n2

= n) havi ng col l ege degr ees . The t wo

di f f er ent model s can be wr i t t en i n mat r i x not at i on as

Model 1:

I I I B 3

ε

ε

ε

α

α

n

2

1

2

1

n

2

1

+

01

01

11

11

=

Y

Y

Y

M

MM

MM

M

or Y = X α + ε

Model 2:

ε

ε

ε

β

β

n

2

1

2

1

n

2

1

+

10

10

01

01

=

Y

Y

Y

M

MM

MM

M

or Y = X*β + ε .

The l eas t squar es es t i mat or s of t he vect or s α and β ar e

gi ven by

α = ( X' X)- 1

X' Y

α

α

ˆ

ˆ =

Y - Y

Y =

2

1

21

2

and

β = ( X*' X*)- 1

X*' Y

β

β

ˆ

ˆ

=

Y

Y =

2

1

2

1

wher e Y1 and Y2 r espect i vel y, denot e t he sampl e mean i ncome

f or t hose havi ng col l ege degr ees and t hose wi t hout a

I I I B 4

degr ee. Not e t hat t hese ar e sampl e es t i mat es ( sampl e means)

of t he popul at i on means .

c. Dummy Var i abl e Tr ap

Cons i der t he model

Yt

= γ1

+ γ2

D1t

+ γ3

D2t

+ εt

or i n mat r i x f or m

, +

101

101

011

011

=

Y

Y

Y

n

2

1

3

2

1

n

2

1

ε

ε

ε

γ

γ

γ

M

MMM

MMM

M

Y = X**γ + ε

The l eas t squar es es t i mat or s of γ, i f t hey exi s t , ar e gi ven

by

γ = ( X**' X**)- 1

X**' Y.

Not e t hat

1 1 0

1 1 01 1 1

. . .X**'X** = 1 1 1 0 . 0

. 1 00 0 0 1 . 1

1 0 1

1 0 1

K

K

K

;

n0n

0nn

nnn

=

22

11

21

I I I B 5hence, t he f i r s t col umn i s equal t o t he sum of t he second

and t hi r d col umns and

X**' X** = 0.

Ther ef or e, ( X**' X**)- 1

and t he vect or γ i s not def i ned.

Not e t hat t hi s pr obl em coul d be det ect ed by not i ng t hat t he

f i r s t col umn i n X** i s equal t o t he sum of t he second and

t hi r d col umns .

The dummy var i abl e t r ap cor r esponds t o i ncl udi ng an

i nt er cept i n a model i n whi ch t he same number of dummy

var i abl es have been i ncl uded as cat egor i es f or t he

qual i t at i ve char act er i s t i c. The dummy var i abl e t r ap can be

t hought of as r esul t i ng I per f ect mul t i col l i near i t y.

Two appr oaches t o avoi di ng t he dummy var i abl e t r ap ar e

:

( 1) use an i nt er cept and one f ewer dummy var i abl e

t han cat egor i es or

( 2) i ncl ude t he same number of dummy var i abl es as

cat egor i es ( wi t h onl y one char act er i s t i c) , but

del et i ng t he i nt er cept .

I I I B 6d. Gener al i zat i ons

Ther e ar e numer ous ways i n whi ch dummy var i abl es can be

advant ageous l y used i n f or mul at i ng economet r i c model s .

Sever al qual i t at i ve char act er i s t i cs can be model ed i n t he

same equat i on wi t h or wi t hout quant i t at i ve var i abl es . I f

sever al qual i t at i ve char act er i s t i cs ar e t o be i ncl uded i n a

model as expl anat or y var i abl es , an i nt er cept and one f ewer

dummy var i abl es t han cat egor i es shoul d be i ncl uded f or each

qual i t at i ve char act er i s t i c. I nt er act i on t er ms ( pr oduct s of

bi nar y var i abl es ) can be i ncl uded. The dependent var i abl e

can be chosen t o be a bi nar y var i abl e i n appl i cat i ons such

as sel ect i ng good l oan appl i cant s or i n det er mi ni ng whi ch

i ncome t ax r et ur ns t o audi t . Al t er nat i ve appr oaches t o

us i ng dummy var i abl es as dependent var i abl es ar e avai l abl e

and a f ew wi l l be di scussed i n Sect i on 2 ( I I I . B. 2) .

e. Some exampl es and pr ecaut i onar y comment s

( 1) Consumpt i on behavi or i n war t i me ( or ot her uni que t i me

per i ods)

Def i ne Zt

= 1 i f t cor r esponds t o war t i me and 0

ot her wi se.

I ndi cat e how t o model each of t he f ol l owi ng

s i t uat i ons .

( 1) ( 2)

( 3)

β2

β1

β1

β2

β1

I I I B 7

wher e Ct

and Yt

denot e consumpt i on and i ncome i n

per i od t . Case ( 1) cor r esponds t o a model wi t h di f f er ent s l opes and a common s l ope, ( 2) a common i nt er cept and di f f er ent s l opes , and ( 3) t he poss i bi l i t y of di f f er ent i nt er cept s and s l opes .

I t can be shown t hat us i ng dummy var i abl es t o

es t i mat e t he i nt er cept ( s ) and s l ope( s ) i s mor e

ef f i ci ent t han r unni ng separ at e r egr ess i ons i n

cases ( 1) and ( 2) but i s equi val ent t o r unni ng

separ at e r egr ess i ons f or case 3.

( 2) I nt er act i on Ter ms

The use of bi nar y var i abl es i n r egr ess i on model s

t akes account of "addi t i ve" ef f ect s . For

exampl e, cons i der t he model

Sal ar y = β1

+ β2

( i ncome) + β3

( gender ) + β

4( r ace)

wher e

Gender = 1 f emal e = 0 ot her wi se

Race = 1 mi nor i t y

= 0 ot her wi se.

3 4 and β β , r espect i vel y, measur e t he addi t i ve

i mpact on sal ar i es of bei ng a woman and a member of

a mi nor i t y. I f t he dat a sugges t t hat t her e i s an

ext r a i mpact ( pos i t i ve or negat i ve) of bei ng a

woman and a mi nor i t y, t hi s can be model ed us i ng an

I I I B 8i nt er act i on t er m Z = ( Gender ) ( Race) by es t i mat i ng

t he model

Sal ar y = β1

+ β2

( i ncome) + β3

( Gender ) + β4

( Race)

+ β5

Z.

β5

coul d be t es t ed f or s t at i s t i cal l y

s i gni f i cance. A s i mi l ar appr oach coul d be t aken t o

al l ow gender , r ace, and i nt er act i on ef f ect s t o i mpact

t he s l ope.

( 3) The Rat chet t Ef f ect

Thi s exampl e does not use dummy var i abl es , but i l l us t r at es

how i magi nat i ve use of dat a can be pr of i t abl y ut i l i zed. Let

Yt

* = hi ghes t i ncome l evel exper i enced. Cons i der t he

f ol l owi ng f i gur es .

I I I B 9

The consumpt i on f unct i on depi ct ed i n t he f i r s t f i gur e can be

es t i mat ed f r om t he f ol l owi ng equat i on

Ct

= βYt

+ γ( Y*t

- Yt

) .

Not e t hat f or per i ods i n whi ch t her e i s "gr owt h" ( not j us t

r ecover y) Yt

= Yt *and C

t = βY

t and dur i ng a r ecess i on or

associ at ed r ecover y Y*t

i s f i xed and i s gr eat er t han Yt

and Ct

= γYy

* + ( β - γ) Yt

. I n or der t o t es t t o see i f aggr egat e

behavi or al di f f er ences exi s t dur i ng gr owt h per i ods as compar ed

wi t h r ecess i on or r ecover y per i ods t he hypot hes i s H0

: γ = 0

coul d be t es t ed.

( 4) A Pr ecaut i onar y Not e

I I I B 10Cons i der t he pr obl em of model i ng t he i mpact of educat i on

upon sal ar y wher e educat i on f or each i ndi vi dual i s r epor t ed as

bei ng ( a) hi gh school ( HS) or l ess , ( b) havi ng at t ended

col l ege ( BS) , ( c) Mas t er ' s degr ee ( MS) , or ( d) havi ng a Ph. D.

( PhD) .

The l evel of educat i on mi ght be measur ed i n sever al ways .

Thr ee of whi ch mi ght be ( E1, E2 or E3) :

E1

E2

E3

HS

1

12

Number of Year s At t endi ng School

BS

2

16

MS

3

18

PhD

4

20

E1 ass i gns an i ndex t o t he cat egor i es ( assumi ng a monot oni c

r el at i onshi p) , E2 i s a r ough measur e of t he number of year s

of school , and E3 assumes a l i near r el at i onshi p bet ween t he

dependent var i abl e and t he number of year s of school .

Al t er nat i vel y, bi nar y var i abl es coul d be used whi ch al l ow

di f f er ent i at ed i mpact s f or di f f er ent degr ees . To expl or e

t hi s appr oach f ur t her , l et

D1 = 1 HS

= 0 Ot her wi se D2 = 1 BS

= 0 Ot her wi se D3 = 1 MS

= 0 Ot her wi se D4 = 1 PhD

= 0 Ot her wi se

I I I B 11Now cons i der t he f our model s f or r el at i ng sal ar y t o t he

l evel of educat i on:

Model 1. St

= α1

+ α2

E1t

+ ξt

Model 2. St

= β1

+ β2

E2t

+ ηt

Model 3. St

= γ1

+ γ2

E3t

+ ψ

Model 4. St

= δ1

+ δ2

D2t

+ δ3

D3t

+ δ4

D4t

+ εt

These f or mul at i ons have ver y di f f er ent i mpl i cat i ons f or t he

es t i mat ed mar gi nal benef i t of obt ai ni ng a hi gher degr ee or an

addi t i onal year of school . These r esul t s ar e summar i zed i n

t he next t abl e.

Mar gi nal Benef i t of an Addi t i onal Degr ee*

Model 1

Model 2

Model 4

BS

α2

4β2

δ2

MS

α2

2β2

δ3- δ2

PhD

α2

2β2

δ4- δ3

*Model t hr ee ass i gns a cons t ant mar gi nal expect ed val ue of γ2

t o each addi t i onal year of school at al l educat i onal l evel s .

Not e t hat onl y model 4 al l ows f or di f f er ent i at ed r et ur ns t o

degr ees . These r et ur ns can even be negat i ve. I f δ2

and δ3

- δ2

ar e

pos i t i ve and δ4

- δ3

i s negat i ve, t hi s sugges t s t hat expect ed

sal ar i es ar e hi gher f or i ndi vi dual s havi ng a BS or MS r at her t han

t he l ower degr ee, but t hat t he expect ed sal ar y f or t hose wi t h PhDs

I I I B 12i s l ower t han sal ar i es of t hose wi t h a MS. Model 1 i mpl i es a

cons t ant mar gi nal benef i t f or at t ai ni ng each addi t i onal degr ee.

Al so not e t hat i n model s 1, 2, and 4 t he mar gi nal benef i t of

addi t i onal year s of school i ng i n each f or mul at i on i s zer o unl ess

t her e i s a change i n gr oup member shi p ( addi t i onal degr ee i s

ear ned) .

The f or mul at i on associ at ed wi t h Model 1 i mpl i es t hat t he

mar gi nal benef i t i s l i near i n t he educat i on var i abl e. The

es t i mat es al so depend upon how t he gr oups ar e number ed. For

exampl e, i f t he var i abl e has been def i ned as

E1*

HS 1

PhD 2

BS 3

MS 4

Thi s woul d sugges t t hat t he mar gi nal benef i t of a Ph. D. over

havi ng not gone pas t hi gh school i s t he same as t he expect ed

benef i t of havi ng an MS degr ee i ns t ead of s t oppi ng at a BS

degr ee.

We need t o be ver y car ef ul about t he i mpl i cat i ons of t he adopt ed

speci f i cat i on. Some r epr esent at i ons of t he i mpact of mar i t al

s t at us on dependent var i abl es ar e subj ect t o t he pr evi ous l y

ment i oned i s sues . I nt r oduci ng di f f er ent bi nar y var i abl es f or

di f f er ent cat egor i es al l ows t he gr eat es t f l exi bi l i t y. We may al so

want t o al l ow f or nonl i near r el at i onshi ps bet ween var i abl es such

as weal t h, r egr ess i ng per sonal i ncome or weal t h on age and ( age) 2

t o t ake account of a l i f e cycl e ef f ect .

I I I B 13

2. Models with binary dependent variables or limited dependent variables

a. I nt r oduct i on

Cons i der model s i n whi ch one mi ght want t o expl ai n

( 1) when t her e wi l l be a def aul t on a l oan ( Y = 1) or no def aul t

( Y = 0)

( 2) whet her a t ax r et ur n has been f i l ed by someone who has

mi s r epr esent ed t hei r f i nanci al pos i t i on ( Y = 1) or accur at el y

r ef l ect s t he s i t uat i on ( Y = 0)

( 3) The mar ket shar e of a f i r m ( 0 ≤ Y ≤ 1)

These ar e known as l i mi t ed dependent var i abl e pr obl ems .

Amemi ya ( 1981) has an excel l ent sur vey paper i n t he Jour nal of

Economi c Li t er at ur e.

I n each case t he dependent var i abl e ( Y) i n t he f unct i on

Y = f ( X; β) + ε

i s cons t r ai ned i n val ue.

Numer ous appr oaches have been adopt ed f or t hi s pr obl em and

t hese i ncl ude r egr ess i on anal ys i s , l i near pr obabi l i t y model s ,

di scr i mi nant anal ys i s , and l i mi t ed dependent model s .

b. Li near Pr obabi l i t y Model ( LPM)

Let yt

= α + βXt

+ εt

yt

= 1 i f f i r s t opt i on chosen

0 ot her wi se

xt

vect or of val ues of at t r i but es

( i ndependent var i abl e( s ) )

εt

i ndependent l y di s t r i but ed r andom var i abl e

wi t h a zer o mean

Implications of the LPM:

• E( yt

) = Xtβ

Now l et Pt

= Pr ob( yt

= 1)

I I I B 14Q

t = 1 - P

t = Pr ob( y

t = 0)

so t hat

E( yt

) = 1 • Pr ob( yt

= 1) + 0 • Pr ob( yt

= 0)

= 1 • Pt

+ 0 • Qt

= Pt

Thus t he r egr ess i on equat i on descr i bes t he pr obabi l i t y t hat t he

f i r s t choi ce i s made. The vect or β measur es t he ef f ect of a uni t

change i n t he expl anat or y var i abl es on t he pr obabi l i t y of choos i ng

t he f i r s t al t er nat i ve. OLS can be used t o es t i mat e t he LPM;

however , t her e i s some ques t i on about t he appr opr i at eness of OLS

i n t hi s model . To appr eci at e t he r easons f or t hi s concer n, not e

t he f ol l owi ng:

εt

= yt

- Xtβ

• Si nce y can onl y assume t he val ues of 0 or 1, εt

can’t be

di s t r i but ed nor mal l y.

Fur t her , E( εt

) = Pt

( 1 - Xtβ) + ( 1 - P

t) ( - X

tβ) and i f

E( εt

) = 0 t hi s i mpl i es

Pt

= Xtβ and

( 1 - Pt

) = 1 - Xtβ.

Now t o f i nd t he var i ance of t he er r or t er m εt

• Var ( εt

) = E( ε 2

t ) = ( 1 - Xtβ) 2 P

t + ( - X

tβ) 2( 1 - P

t)

I I I B 15 = ( 1 - X

tβ) 2( X

tβ) + ( X

tβ) 2( 1 - X

tβ)

= ( 1 - Xtβ) ( X

tβ)

whi ch shows t hat t he variance of the error depends on the

independent variables and, by definition, is heteroskedastic. One

poss i bl e sol ut i on t o t hi s pr obl em i s t o use wei ght ed l eas t

squar es .

• Anot her pr obl em wi t h t he LPM i s t hat of pr edi ct i on:

Not e t hat wi t h t he l i near pr obabi l i t y model t her e i s a chance

t hat pr edi ct ed val ues f or yt

may l i e out s i de t he i nt er val [ 0, 1] .

One poss i bl e sol ut i on i s t o set al l pr edi ct i ons gr eat er t han 1

equal t o 1 and al l pr edi ct i ons l ess t han 0 equal t o zer o.

However , t hese obser vat i ons pr esent a pr obl em i n r unni ng wei ght ed

l eas t squar es .

I I I B 16c. Qual i t at i ve Response Model s

( 1) I nt r oduct i on

Anot her poss i bi l i t y f or bi nar y or l i mi t ed dependent var i abl es

i s t o use cons t r ai ned es t i mat i on. Di scr i mi nant anal ys i s i s s t i l l

anot her appr oach. Si nce obser ved val ues f or Yt

ar e cons t r ai ned t o

t he i nt er val ( 0, 1) , f unct i onal f or ms F( Xt) whi ch ar e cons t r ai ned

t o t he i nt er val ( 0, 1) can be sel ect ed. Thi s qui t e nat ur al l y

sugges t s us i ng cumul at i ve pr obabi l i t y di s t r i but i ons f or F( Xt) .

F( Xt

) = Pt

Thi s poss i bi l i t y admi t s many al t er nat i ve model s :

( ) tX

t tt = Pr Y 1 X F( ; ) = f(s; ) dsP Xβ

−∞= = β θ θ∫

wher e f ( s ; θ) denot es a "wel l behaved" pr obabi l i t y dens i t y f unct i on

wi t h di s t r i but i onal par amet er s θ. F( Xtβ; θ) i s t he cor r espondi ng

cumul at i ve di s t r i but i on f unct i on eval uat ed at Xtβ, whi ch i s

somet i mes r ef er r ed t o as t he scor e . Two model s whi ch have been

wi del y used ar e t he s t andar d nor mal and l ogi s t i c model s :

I I I B 17

f ( s ; θ)

z

-F(z) = f(s; ) ds

∞θ∫

Nor mal

π2

e2/s-

2

π∫ ∞

2

e2/s-

z

-

2

Logi s t i c

)e+(1

e2s-

s-

-z

1

1 + e

These t wo di s t r i but i ons ar e onl y t wo of many whi ch coul d have been

used, but cur r ent l y domi nat e t hi s l i t er at ur e and ar e r espect i vel y

known as pr obi t ( based on t he nor mal ) and l ogi t ( based on t he l og

l ogi s t i c) model s .

( 2) Es t i mat i on

The es t i mat i on of l i mi t ed dependent model s depends upon t he

model or dens i t y sel ect ed and t he nat ur e of t he dat a.

( a) Yt

= 0 or 1 and ( b) 0 < Yt

< 1.

I f we have dat a based on di scr et e choi ces , t hen we have t he case

(a) Yt = 0 or 1.

The l i kel i hood f unct i on i n t hi s case i s gi ven by

tt

n1-YY

t t t

t=1

L( , ; ) = (1 - )Y P Pβ θ Π

t t

n1-Y Y

t t

t 1

= F( ; (1 - F( ; ) ) )x x=

β θ β θ∏

and t he l og l i kel i hood f unct i on i s

n

t t t t t

t=1

( , ; ) lnF( ; ) + (1 - ) ln(1 - F( ; ) .Y Y x Y xβ θ = β θ β θ∑l

Thi s expr ess i on i s maxi mi zed over t he par amet er s β and θ t o obt ai n

maxi mum l i kel i hood es t i mat or s . Thi s pr ocedur e can be qui t e

I I I B 18

i nvol ved i f t he expr ess i on f or t he cumul at i ve di s t r i but i on i s

compl i cat ed. Recal l t hat

ds)f(x; = )x Pr(z = t),xF( x

-ttt θ∫β≤ββ

∞

wher e θ denot es unknown di s t r i but i onal par amet er s . Any pdf coul d

be sel ect ed i n t he pr evi ous f r amewor k. The pr edi ct ed i mpact of a

change i n t he expl anat or y var i abl es depends on t he pdf as

( )( )

Pr 1t t

i t

it

Y Xf X

Xβ β

∂ ==

∂.

Thus , t he i

β coef f i ci ent s al one do not pr ovi de es t i mat es of t he

mar gi nal i mpact of a change i n t

X on ( )Pr 1t t

Y X= .

I I I B 19

probit Y X1 X2, options

St at a commands f or es t i mat i ng l i mi t ed dependent var i abl es

model s . As not ed ear l i er , t he t wo mos t commonl y used pdf ’s i n

qual i t at i ve r esponse model s ar e t he nor mal and l ogi s t i c

di s t r i but i ons wi t h t he cor r espondi ng qual i t at i ve r esponse model s

bei ng r ef er r ed t o as t he pr obi t and l ogi t model s whi ch can be

es t i mat ed i n mos t common economet r i c sof t war e packages . Some

usef ul St at a commands i n wor ki ng wi t h bi nar y var i abl es ar e gi ven

bel ow:

• To cr eat e dummy var i abl es i n St at a, use t he “gen” command

as f ol l ows: gen dummy_var = exp

wher e exp i s an expr ess i on t hat cat egor i zes t he

dummy_var as a 0 or 1. For exampl e, t o t ake a

cont i nuous var i abl e on i ncome and cr eat e a dummy

var i abl e wher e a 0 r epr esent s “l ess t han $50, 000

annual l y” and a 1 r epr esent s “$50, 000 or mor e

annual l y, ” use t he f ol l owi ng command:

gen income_dummy = income >= 50000

• The pr obi t model can be es t i mat ed us i ng St at a wi t h t he

command

The maxi mum l i kel i hood es t i mat es , of β1

, β2

, β3

and l og

l i kel i hood val ues wi l l be r epor t ed. The mar gi nal i mpact of

changes i n t he expl anat or y var i abl es on t he pr edi ct i ons

( ( )i tf Xβ β ) r at her t han i

β can be obt ai ned by us i ng t he command

I I I B 20

logit Y X1 X2, options

dprobit Y X1 X2, options

A pr edi ct i on mat r i x can be pr i nt ed us i ng t he command:

estat classification, cutoff(#)

The el ement s on t he mai n di agonal ar e t he number of cor r ect

pr edi ct i ons and t he of f di agonal el ement s i ndi cat e t he number of

mi sses .

Obser ved

D

~D

Pr edi ct ed

+

M11

M12

–

M21

M22

The opt i on,

estat classification,cutoff(for example, .5)

speci f i es t he val ue at whi ch an obser vat i on has a pr edi ct ed

pos i t i ve out come. The def aul t cut of f poi nt i s 0. 5.

• Si mi l ar Logi t r esul t s can be obt ai ned us i ng t he command

• Pr edi ct i on mat r i ces f or t he LPM can be obt ai ned as

f ol l ows

r eg y X’s

pr edi ct yhat

gen pr edy = yhat >. 5

t abul at e y pr edy

I I I B 21

( b) Limited dependent variables models where 0 < Yt < 1

I f we have a di scr et e choi ce model wi t h gr ouped dat a or a

model wi t h t he dependent var i abl e s t r i ct l y bet ween 0 and 1,

al t er nat i ve es t i mat i on t echni ques ar e avai l abl e.

One appr oach i s t o use

m

v = p

t

t

t v

t = number choos i ng t he f i r s t r esponse i n t he

t th gr oup

mt

= number i n t he t th gr oup

F- 1

( Pt

) = Xtβ or

F- 1

( Yt

) = Xtβ

I f F i s known, t hen r egr ess i on t echni ques can be empl oyed t o

es t i mat e t he vect or β. Recal l t hat t he pr obi t model i s based

upon t he nor mal cumul at i ve di s t r i but i on f unct i on and

π∫

β

∞2

dse =

)s(-x

-

2/2

t .

The Logi t model i s based upon t he l ogi s t i c di s t r i but i on f unct i on

e + 1

1 = )xF(

tt -x-t εββ

The pr obi t model i nvol ves r at her compl i cat ed es t i mat i on and t her e

i s no compel l i ng r eason t hat t he nor mal shoul d be used. The Logi t

has t hi cker t ai l s , but appr oxi mat es t he pr obi t model .

The Logi t model i s par t i cul ar l y wel l sui t ed f or gr ouped dat a

or ot her s i t uat i ons i n whi ch

0 < Yt

= F( Xt

B) < 1.

Thi s can be seen by sol vi ng

e + 1

1 = )xF(

tt -x-t εββ = Yt

f or t t

X β ε+ whi ch yi el ds

I I I B 22

t-1t t t

t

Y( ) = ln = + F Y x

1 - Yt

Z β ε

=

Regr ess i on t echni ques can be di r ect l y used t o obt ai n es t i mat or s of

β wher e t he dependent var i abl e ( Zt =l n( Yt / ( 1- Yt) ) i s r egr essed on

t he Xt’s . Not e t hat Yt

≠ 0 or 1 i n t hi s r epr esent at i on.

3. PROBLEM SET 4.2

Dummy/Binary variables

Problems 1, 2, 3, 4, and 5 deal wi t h bi nar y i ndependent var i abl es , i ncl udi ng

t he use of i nt er act i on t er ms . Pr obl ems 5 and 6 f ocus on model i ng bi nar y

dependent var i abl es .

Theory

1. Suppose you col l ect dat a f r om a sur vey on wages , educat i on, exper i ence,

and gender . I n addi t i on you ask f or i nf or mat i on about mar i j uana usage.

The or i gi nal ques t i on i s : "On how many occas i ons l as t mont h di d you smoke

mar i j uana?"

a) Wr i t e an equat i on t hat woul d al l ow you t o es t i mat e t he ef f ect s of

mar i j ana usage on wage, whi l e cont r ol l i ng f or ot her f act or s . You

shoul d be abl e t o make s t at ement s such as , "Smoki ng mar i j uana f i ve

mor e t i mes per mont h i s es t i mat ed t o change wage by x%. "

b) Wr i t e a model t hat woul d al l ow you t o t es t whet her dr ug usage has

di f f er ent ef f ect s on wages f or men and women, whi l e cont r ol l i ng f or

ot her var i abl es . How woul d you t es t t hat t her e ar e no di f f er ences i n

t he ef f ect s of dr ug usage f or men and women? You may want t o model

t he i mpact of i nt er act i ons .

c) Suppose you t hi nk i t i s bet t er t o measur e mar i j uana usage by

put t i ng peopl e i nt o one of f our cat egor i es : nonuser , l i ght user ( 1- 5

t i mes per mont h) , moder at e user ( 6- 10 t i mes per mont h) , and heavy

user ( mor e t han 10 t i mes per mont h) . Now wr i t e a model t hat al l ows

you t o es t i mat e t he ef f ect s of mar i j uana usage on wage, whi l e

cont r ol l i ng f or ot her var i abl es and avoi di ng t he dummy var i abl e t r ap.

I I I B 23

d) Usi ng t he model i n par t ( c) , expl ai n i n det ai l how t o t es t t he

nul l hypot hes i s t hat mar i j uana usage has no ef f ect on wage. Be ver y

speci f i c and i ncl ude a car ef ul l i s t i ng of degr ees of f r eedom.

e) What ar e some pot ent i al pr obl ems wi t h dr awi ng causal i nf er ence

us i ng t he sur vey dat a you col l ect ed?

(Wooldridge 7.8)

Applied

2. The f i l e TRAFFI C2. RAW cont ai ns dat a on t r af f i c acci dent s i n Cal i f or ni a

f r om 1981 t o 1989, wi t h each mont h bei ng a separ at e obser vat i on. You

suspect t hat Cal i f or ni a t r af f i c acci dent s ( l i s t ed i n dat a f i l e as

var i abl e totacc) may be cor r el at ed wi t h t he mont h of t he year .

a) Run a r egr ess i on t hat shows t he ef f ect of t he mont h on t he number

of t r af f i c acci dent s . Does i t appear t hat seasonal adj us t ment i s

appr opr i at e when moni t or i ng t he number of Cal i f or ni a t r af f i c

acci dent s? Jus t i f y.

b) You may have not i ced t hat t he dat a di d not i ncl ude t he var i abl e

jan so t hat t he number of dummy var i abl es woul d be one l ess t han t he

number of cl ass i f i cat i ons . I nser t a var i abl e jan. And set jan = 1

f or Januar y obser vat i ons ( i . e. when al l ot her mont h var i abl es equal

zer o) . What es t i mat i on pr obl ems ar e t her e wi t h havi ng t he same

number of dummy var i abl es as cl ass i f i cat i ons? Es t i mat e t hi s

r egr ess i on and compar e your r esul t s wi t h t he r esul t s of par t ( i ) .

( RST)

3. Cons i der t he f ol l owi ng dat a on t he l engt h of empl oyment and associ at ed

sal ar y l evel .

Empl oyee Sal ar y Year s Empl oyed

1 425 1

2 480 3

3 905 20

4 520 5

5 505 4

6 540 15

7 380 6

I I I B 24

8 440 2

9 420 1

10 405 4

11 650 10

The sal ar y f i gur es ar e r evi ewed by empl oyee number s 1 and 7 and t hey

not e t hat empl oyee number s 1, 2, 7, 9, and 10 ar e member s of a mi nor i t y

gr oup and t hey cl ai m t hat t her e i s evi dence of di scr i mi nat i on i n t he

sal ar y s t r uct ur e. Anal yze t hi s asser t i on.

( JM IIIB-4)

I I I B 25

4. Cons i der t he f ol l owi ng model s :

a. ( )( )1 2 3 4Consump Income Wealth Income Wealthα α α α ε= + + + +

wher e Consump denot es consumpt i on expendi t ur es i n dol l ar s and Income

and Wealth ar e measur ed i n dol l ar s .

( 1) Eval uat e t he mar gi nal pr opens i t y t o consume (Consump

Income

∂

∂) .

( 2) What i s t he i nt er pr et at i on of 4α ?

b. 1 2 3 4 5 6( )( )Wage Female Race Female Race Education Experienceβ β β β β β ε= + + + + + +

wher e Wage r epr esent s t he hour l y wage i n dol l ar s , Education measur es

year s of educat i on beyond hi gh school , Experience i s j ob exper i ence

measur ed i n year s , and Female and Race ar e bi nar y var i abl es wi t h Female

=1 f or f emal e empl oyees and Race=1 f or non- whi t e and non- Hi spani c

empl oyees .

( 1) What i s t he i nt er pr et at i on of each of t he f ol l owi ng

par amet er s?

1

2

3

4

5

6

β

β

β

β

β

β

( 2) What j oi nt hypot hes i s coul d be t es t ed t o check f or gender or

r aci al di scr i mi nat i on?

( 3) How coul d t he model be modi f i ed t o al l ow t he poss i bi l i t y of

di f f er ent annual i ncr eases i n t he hour l y wage r at e f or f emal es?

I I I B 26

5. Cons i der t he f ol l owi ng hypot het i cal dat a ( adapt ed f r om Guj ar at i , p. 473) .

The Y i s a bi nar y var i abl e ( Y=1 owns a home, 0 ot her wi se) and X i s f ami l y

i ncome i n t housands of dol l ar s .

Fami l y Y X Fami l y Y X

1 0 8 21 1 22

2 1 16 22 1 16

3 1 18 23 0 12

4 0 11 24 0 11

5 0 12 25 1 16

6 1 19 26 0 11

7 1 20 27 1 20

8 0 13 28 1 18

9 0 9 29 0 11

10 0 10 30 0 10

11 1 17 31 1 17

12 1 18 32 0 13

13 0 14 33 1 21

14 1 20 34 1 20

15 0 6 35 0 11

16 1 19 36 0 8

17 1 16 37 0 17

18 0 10 38 1 16

19 0 8 39 0 7

20 1 18 40 1 17

a. Fi t a l i near pr obabi l i t y model ( LPM)

1 2Y Xβ β ε= + +

t o t he dat a and i nves t i gat e t he pr edi ct i ve abi l i t y of t he

es t i mat ed model .

b. Fi t pr obi t and l ogi t model s t o t hi s same dat a set and compar e t he

pr edi ct i on r esul t s . I ncl ude t he pr edi ct i on mat r i ces .

For pr obi t or l ogi t model s of t he f or m

y = β0 + β1x1 + β2x2 + . . . + βkxk

Stata uses t he commands :

probit y x1 x2 . . . xk

logit y x1 x2 . . . xk

I n or der t o pr i nt t he pr edi ct i on mat r i x us i ng

a . 5 t hr eshol d use t he command

I I I B 27

c. Compar e t he f or ecas t i ng abi l i t y of t he t hr ee model s ( LPM, pr obi t ,

and l ogi t ) cor r espondi ng t o a cut of f val ue of . 3 Use t he command,

estat class, cutoff(.3)

d. Compar e t he mar gi nal i mpact of a change i n i ncome on t he

l i kel i hood of homeonwner shi p us i ng t he t hr ee model s .

6. Let grad be a dummy var i abl e f or whet her a s t udent - at hl et e at a l ar ge

uni ver s i t y gr aduat es i n f i ve year s . Let hsGPA and SAT be hi gh school

gr ade poi nt aver age and SAT scor e, r espect i vel y. Let study be t he number

of hour s spent per week i n an or gani zed s t udy hal l . Suppose t hat , us i ng

dat a on 420 s t udent - at hl et es , t he f ol l owi ng l ogi t model i s obt ai ned:

( )( )ˆ 1 , , 1.17 .24 .00058 .073P grad hsGPA SAT study hsGPA SAT study= = Λ − + + +

wher e ( ) ( )exp( ) /(1 exp( )) tz z z F X βΛ = + = i s t he cdf f or t he l ogi t model .

Hol di ng hsGPA f i xed at 3. 0 and SAT f i xed at 1, 200, comput e t he es t i mat ed

di f f er ence i n t he gr aduat i on pr obabi l i t y f or someone who spent 10 hour s

per week i n s t udy hal l and someone who spent 5 hour s per week.

( Wool dr i dge, 4th edi t i on pr obl em 17. 2)

I I I B 28

I I I . C

1


IV. Miscellaneous Topics

C. Lagged Variables

I ndi vi dual s f r equent l y r espond t o a change i n i ndependent var i abl es

wi t h a t i me l ag. Consequent l y, economi c model s descr i bi ng i ndi vi dual

behavi or as wel l as model s whi ch at t empt t o r epr esent t he r el at i onshi ps

bet ween aggr egat ed var i abl es wi l l of t en i ncl ude l agged i ndependent

var i abl es or l agged dependent var i abl es . We f i r s t cons i der model s whi ch

i ncl ude l agged i ndependent var i abl es ( di s t r i but ed l ag model s ) and t hen

i nves t i gat e model s cont ai ni ng l agged dependent var i abl es ( aut or egr ess i ve

model s ) . Di s t r i but ed l ag and aut or egr ess i ve model s pr ovi de an at t empt t o

model dynami c behavi or .

1. Lagged Independent Variables - Distributed Lag Models

a. Di s t r i but ed l ag model s ar e of t he f or m:

yt = δ + β0xt + β1xt - 1 + . . . + βsxt - s

+ ut

wher e ∂yt/ ∂xt = β0 denot es t he i mmedi at e i mpact of a change i n

x on y, ∂yt/ ∂xt-i = βi denot es t he i mpact of a change i n x on y

af t er i per i ods . Thus , t he βi’s i ndi cat e t he di s t r i but i onal

( over t i me) i mpact of x on y.

( 1) Di s t r i but ed l ag model s can be es t i mat ed us i ng l eas t squar es i f n

( sampl e s i ze) > number of coef f i ci ent par amet er s ( s + 2 = # l ags

+2 ( f or 0 andδ β ) ) and yi el ds BLUE i f ut ~ NI D ( 0, σ2) .

I I I . C

2

( 2) Sever al poss i bl e pr obl ems can ar i se i n di s t r i but ed l ag model s :

( a) how many l ags shoul d be used ( s=?) , ( b) t he degr ees of

f r eedom ( n - k) = n - 2s - 2 may be smal l f or l ar ge l ags ( s ) ,

and ( c) a ser i ous mul t i col l i near i t y pr obl em can ar i se i f t he

x' s ar e s t r ongl y i nt er cor r el at ed wi t h t he cor r espondi ng β i

bei ng ver y er r at i c.

b. Al t er nat i ve Es t i mat i on Pr ocedur es : An al t er nat i ve es t i mat i on

pr ocedur e whi ch has been pr oposed t o "ci r cumvent " t he i mpact of

poss i bl e mul t i col l i near i t y i s t o i mpose some "r easonabl e" pat t er n

t o t he βi' s i n t he es t i mat i on pr ocedur e. I deal l y, t he val i di t y of

t hese hypot hes i zed cons t r ai nt s woul d be t es t ed. Two of t he mos t

commonl y encount er ed pat t er ns f or t he βi' s ar e t he Koyck scheme

and Al mon pol ynomi al wei ght s . The Koyck model assumes t hat t he

βi' s decl i ne geomet r i cal l y and t he Al mon f or mul at i on assumes t hat

t he pat t er ns i n t he βi' s can be model ed by a pol ynomi al i n "i " .

We wi l l f i r s t di scuss t he Koyck model , t hen t he Al mon pr ocedur e,

and t hen cons i der an appl i cat i on of t hese pr ocedur es t o es t i mat i ng

t he r el at i onshi p bet ween sal es and adver t i s i ng expendi t ur e.

( 1) Koyck Scheme

Model : yt = δ + β0xt + β1xt - 1 + . . . + ut

I I I . C

3

Koyck sugges t ed t hat t he βi be appr oxi mat ed by

The Koyck wei ght s ( βi) decl i ne geomet r i cal l y f or 0 < λ < 1.

We now der i ve an equat i on whi ch can be used i n es t i mat i ng t he

Koyck f or mul at i on of di s t r i but ed l ag coef f i ci ent s wi t h

geomet r i cal l y decl i ni ng wei ght s . Thi s der i vat i on i s done i n

t wo ways : ( 1) us i ng a l i near oper at or and ( 2) us i ng al gebr ai c

mani pul at i ons . Let Lxt = xt - 1

, L2xt = xt - 2

, et c.

( 1) Subs t i t ut i ng t he Koyck expr ess i on f or βi i nt o t he di s t r i but ed

l ag model yi el ds i it t0t

i=0

= + ( ) + uy L x∞

δ β λ∑ or

0t tt

= + ( ) + .y ux1 - L

βδ

λ

Mul t i pl yi ng bot h s i des of t hi s equat i on by ( 1 - λL) yi el ds

yt - λyt - 1

= ( 1- λL) yt =( 1 - λ) δ + β0xt + ut - λut-1

yt = δ(1 - λ) + β0xt + λyt-1 + ut - λut-1.

βi = β0λi

βi

I I I . C

4

or

Not e t hat t hi s equat i on can be es t i mat ed by r egr ess i ng yt on xt

and yt-1.

( 2) Anot her way t o der i ve t he es t i mat i ng equat i on f or t he

Koyck di s t r i but ed l ag model wi t hout t he l ag oper at or ( L) i s as

f ol l ows:

Subs t i t ut e βj = β0λj i nt o equat i on f or t he di s t r i but ed l ag

model t o obt ai n

yt = δ + β0xt + β0λxt-1 + β0λ2 xt-2 + . . . + ut.

Now r epl ace t by "t - 1" i n t hi s equat i on and mul t i pl y by λ

λyt-1 = δλ + β0λxt-1 + β0λ2xt-2 +. . . +λut-1.

Subt r act t hese t wo equat i ons t o obt ai n

yt - λyt-1 = δ( 1 - λ) + β0xt + ut - λut-1

wher e vt = ut - λut-1 and t hi s es t i mat i ng equat i on i s t he same

as obt ai ned i n ( 1) .

yt = δ(1 - λ) + β0xt + λyt-1

+ vt

I I I . C

5

Not e: ( a) The assumpt i on of a Koyck wei ght i ng scheme r educes

t he number of par amet er s t o be es t i mat ed t o 3 ( δ, λ, β0) .

( b) I f t he ut' s i n t he or i gi nal model ar e i ndependent l y

di s t r i but ed, t hen t he l as t r epr esent at i on of t he model i s

char act er i zed by aut ocor r el at i on and cont ai ns a l agged

dependent var i abl e whi ch poses speci al es t i mat i on pr obl ems and

wi l l be cons i der ed l at er .

( 2) Al mon Pol ynomi al Di s t r i but ed Lags

The Al mon pol ynomi al di s t r i but ed l ag f or mul at i on i s one of t he

mos t wi del y used i n pr act i ce. We begi n wi t h a model wi t h a

f i ni t e number of l ags :

Model : yt = δ + β0xt + β1xt-1 + . . . + βsxt-s + ut.

The Al mon wei ght i ng Scheme i s def i ned by:

βj = f ( j ) = ao + a1 j + . . . + ap j p j =1, 2, . . . , s

s = # of l ags = # of β' s - 1

p = degr ee of pol ynomi nal .

Pol ynomi al s ar e ext r emel y f l exi bl e and can be used t o

appr oxi mat e any cont i nuous f unct i on as accur at el y as des i r ed

by sel ect i ng p t o be l ar ge enough.

The cor r espondi ng es t i mat i ng equat i on can be obt ai ned by

subs t i t ut i ng f ( j ) f or βj i nt o t he di s t r i but ed l ag model ,

I I I . C

6

col l ect i ng t er ms i nvol vi ng a i' s and t hen es t i mat i ng t he a i' s

us i ng l eas t squar es . Gi ven es t i mat es f or t he a i' s ,

cor r espondi ng es t i mat es of t he βj' s can be obt ai ned f r om t he

es t i mat ed f ( j ) . By us i ng such a speci f i cat i on we ar e

es t i mat i ng ( p + 2) par amet er s ( δ, a0, . . . , ap) r at her t han

( s + 2) par amet er s ( δ, β0, . . . , βs) . I f p ( t he degr ee of

pol ynomi al def i ni ng t he wei ght s ) i s smal l er t han s ( t he

maxi mum l ag) , t hen t he Al mon wei ght i ng scheme r esul t s i n f ewer

par amet er s needi ng t o be es t i mat ed. I n gener al p i s usual l y

sel ect ed t o be r at her smal l ( 2, 3, 4) .

To per f or m t hi s es t i mat i on pr ocedur e i n Stata, gener at e t he

pol ynomi al var i abl es ( t he “z i' s ”) , r un t he r egr ess i on of t he

dependent var i abl e on t he pol ynomi al var i abl es , and t hen

r ecover t he βj' s f r om t he es t i mat i on. For exampl e, t he

f ol l owi ng code wi l l es t i mat e t he pr evi ous model wi t h t hr ee

l ags ( s=3) us i ng a second or der ( p=2) pol ynomi al t o descr i be

t he pat t er ns of t he βi' s :

*generate the polynomial variables

gen z0 = X+X[_n-1]+X[_n-2]+X[_n-3]

gen z1 = X[_n-1]+X[_n-2]*2+X[_n-3]*3

gen z2 = X[_n-1]+X[_n-2]*4+X[_n-3]*9

*regress the Y variable on the polynomial variables

reg Y z0 z1 z2

estat ic

*recover the betas

scalar b0 = _b[z0]

scalar b1 = _b[z0]+_b[z1]+_b[z2]

scalar b2 = _b[z0]+_b[z1]*2+_b[z2]*4

scalar b3 = _b[z0]+_b[z1]*3+_b[z2]*9

*display the betas

display b0, b1, b2, b3

The mat hemat i cal det ai l s behi nd t hese t r ans f or mat i ons ar e

i l l us t r at ed i n t he f i r s t sect i on of t he appendi x. Thi s

es t i mat i on pr ocedur e i s automated by such pr ogr ams as SAS and

SHAZAM. For exampl e t he SHAZAM command t o es t i mat e t he

I I I . C

7

pr evi ous model wi t h t hr ee l ags ( s=3) us i ng a second or der

( p=2) pol ynomi al t o descr i be t he pat t er ns of t he βi' s i s gi ven

by:

OLS Y X(0.3,2)

Thi s command wi l l not onl y es t i mat e t he a i' s , but wi l l al so

gener at e t he β i' s . However , many cal cul at i ons ar e goi ng on

i n t he backgr ound. The r el at ed det ai l s and di s t r i but i onal

det ai l s ar e summar i zed i n t he appendi x "A Few Det ai l s f or t he

Al mon Di s t r i but ed Lag. "

Examples:

The Al mon es t i mat or s have a smal l er var i ance t han t he l eas t

squar es es t i mat or , whet her t he assumpt i on of a pol ynomi al l ag

i s val i d or not . I f t he assumpt i on i s i ncor r ect t he Al mon

es t i mat or i s bi ased and i ncons i s t ent [ cf . Schmi dt & Si ckl es ,

I ER ( Oct ober 1975) ; Schmi dt & War d, JASA ( Mar ch 1973) ] .

TESTING t he Al mon scheme

Ho: βj = f ( j ) = ao + a1 j + . . . + ap j p j =1, 2, . . . , s

can be per f or med us i ng LR or Chow t es t s t o compar e t he Al mon

and OLS r esul t s .

I I I . C

8

c. A Revi ew and Appl i cat i on of Di s t r i but ed Lag Model s t o Es t i mat i ng

t he Rel at i onshi p Bet ween Sal es and Adver t i s i ng

I n many s i t uat i ons t he economi c agent s whose behavi or i s bei ng

model ed don' t r eact i mmedi at el y or compl et el y t o changes i n t he

economi c envi r onment . I ns t ead, t he adj us t ment may be gr adual and

t ake pl ace over sever al per i ods of t i me. The del ay may be due t o

habi t per s i s t ence, t he cos t of f r equent changes , t he del ay i n

gat her i ng dat a or ot her t echnol ogi cal , i ns t i t ut i onal or behavi or al

f act or s . Wel l - known exampl es woul d i ncl ude t he r esponse of such

macr oeconomi c var i abl es as GDP or pr i ces t o unexpect ed changes i n

t he money suppl y, gover nment spendi ng or t he t ax sys t em.

Adver t i s i ng has al so been shown t o have an i mpact on sal es whi ch

gener al l y l as t s f or mor e t han one per i od of t i me.

Di s t r i but ed l ag model s pr ovi de a conveni ent descr i pt i ve model

of s i t uat i ons i n whi ch changes i n an i ndependent var i abl e may have

an i mpact whi ch l as t s f or sever al t i me per i ods .

A s i mpl e exampl e of such a model i s gi ven by

St = δ + β0At + β1At-1 + β2At-2 + . . . + βkAt-k + εt

wher e St and At r epr esent sal es and adver t i s i ng expendi t ur e dur i ng

t he t th t i me per i od. I n t hi s model "δ" r epr esent s t he l evel of

sal es whi ch woul d t ake pl ace wi t hout any adver t i s i ng. The i mpact

of adver t i s i ng can be r eadi l y det er mi ned. An i ncr ease i n

adver t i s i ng of one uni t woul d be expect ed t o i ncr ease sal es by β0

dur i ng t he same per i od. Sal es i n t he next per i od woul d i ncr ease

I I I . C

9

by β1 uni t s . Si mi l ar l y, t he i mpact on sal es af t er k t i me per i ods

i s gi ven by βk.

I I I . C

10

The "di s t r i but ed l ag" ef f ect of adver t i s i ng on sal es mi ght be

vi sual l y r epr esent ed as f ol l ows:

Fi gur e 2

Di s t r i but ed l ag coef f i ci ent s

Thi s f i gur e cor r esponds t o t he case i n whi ch i ncr eased adver t i s i ng

has an i mmedi at e i mpact on sal es , t he i mpact i ncr eases f or t wo

per i ods , t hen decl i nes and t hen t her e i s no i mpact af t er f our

per i ods . An al t er nat i ve scenar i o mi ght be wher e adver t i s i ng has

t he gr eat es t i mpact on sal es i n t he same t i me per i od, f ol l owed by

a gr adual l y decl i ni ng i mpact . Thi s coul d be r epr esent ed i n Fi gur e

3.

βi

βi

I I I . C

11

Fi gur e 3

Decl i ni ng di s t r i but ed l ag coef f i ci ent s

Di s t r i but ed l ag model s ar e ext r emel y f l exi bl e i n t er ms of

admi ss i bl e behavi or . However , t hi s f l exi bi l i t y can l ead t o

es t i mat i on pr obl ems . I n pr i nci pl e, l eas t squar es es t i mat es of t he

coef f i ci ent s ar e t he mi ni mum var i ance es t i mat or s of al l unbi ased

es t i mat or s of t he coef f i ci ent s i n di s t r i but ed l ag model s under t he

s t andar d assumpt i ons associ at ed wi t h t he model .

I n pr act i ce, sever al di f f i cul t i es ar e encount er ed. I n or der

t o i l l us t r at e t hese pr obl ems , assume t hat mont hl y obser vat i ons on

sal es and adver t i s i ng f or t hr ee year s ar e avai l abl e. I n or der t o

es t i mat e t he di s t r i but ed i mpact of adver t i s i ng on sal es , we mi ght

cons i der es t i mat i ng t he model :

St = δ + β0At + β1At-1 + . . . + β12At-12 + εt.

Thi s speci f i cat i on cont ai ns 14 unknown par amet er s ( coef f i ci ent s )

and r equi r es obser vat i ons on each of t he var i abl es , i . e. , St, At,

At-1, . . . , At-12. These dat a ar e r epor t ed i n t he Tabl e i n t he

Appendi x l abel ed "Sal es and Adver t i s i ng Dat a. " I n or der t o have

an obser vat i on f or each var i abl e i ncl udi ng At-12, t he f i r s t t wel ve

obser vat i onal val ues on sal es mus t be del et ed wi t h t he f i r s t

useabl e t i me per i od cor r espondi ng t o t =13. Hence, t he useabl e

sampl e s i ze i s r educed f r om 36 t o 24 by t he i ncl us i on of t he 12

l agged var i abl es f or adver t i s i ng. The degr ees of f r eedom

associ at ed wi t h t hi s model ar e 10 ( useabl e sampl e s i ze - number of

coef f i ci ent s t o be es t i mat ed) . I n f act i f 17 l ags had been

I I I . C

12

i ncl uded, t he useabl e sampl e s i ze woul d be equal t o t he number of

coef f i ci ent s t o be es t i mat ed and t he degr ees of f r eedom woul d be

zer o.

Anot her pr obl em ar i ses when t he expl anat or y var i abl e i s

associ at ed wi t h a t r end over t i me. I f t he t r end i s appr oxi mat el y

l i near , t hen mul t i col l i near i t y bet ween t he cur r ent and l agged

val ues of t he expl anat or y var i abl es may make i t di f f i cul t t o

accur at el y es t i mat e i ndi vi dual par amet er coef f i ci ent s . The

pai r wi se cor r el at i ons of l agged adver t i s i ng ar e gi ven i n t he

f ol l owi ng t abl e:

Tabl e 2

Pai r wi se Cor r el at i ons of Lagged Adver t i s i ng

A A( - 1) A( - 2) A( - 3) A( - 12)

A 1 . 874 . 866 . 859 . . .

. 892

A( - 1) 1 . 874 . 855 . . . .

896

A( - 2) 1 . 863 . . . . 839

A( - 3) 1 .

. .

. .

. .

A( - 12) 1

Each of t hese s i t uat i ons ( l ow degr ees of f r eedom and

mul t i col l i near i t y) can r esul t i n unr el i abl e es t i mat es of t he di s t r i but ed l ag

coef f i ci ent s ( βi) .

OLS estimation (demonstration using Stata):

I I I . C

13

As a case i n poi nt , i f we r egr ess sal es on adver t i s i ng expendi t ur e

f or t he cur r ent and pr evi ous t wel ve mont hs us i ng t he command:

. t s set t

. r eg S A A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 or

. r eg S A A1- A12

. es t at i c “r epor t s t he cor r espondi ng l og- l i kel i hood val ue”

wher e each of t he AJ have been gener at ed by addi ng an “L” i n f r ont

of t he var i abl e

. gen A1 = l . A

. gen A2 = l . A1

…

. gen A12 = l . A11

We t hen obt ai n


-------------+------------------------------ F( 13, 10) = 3.51

Model | 8029.73337 13 617.671797 Prob > F = 0.0268


-------------+------------------------------ Adj R-squared = 0.5864

Total | 9790.5 23 425.673913 Root MSE = 13.269

------------------------------------------------------------------------------

S | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

A | .4270829 .2063794 2.07 0.065 -.0327592 .8869249

A1 | .0015484 .2161103 0.01 0.994 -.4799754 .4830721

A2 | .1026181 .1849852 0.55 0.591 -.3095545 .5147907

A3 | .1387561 .1593701 0.87 0.404 -.2163427 .4938549

A4 | -.0324424 .1771302 -0.18 0.858 -.427113 .3622282

A5 | -.0431555 .1744989 -0.25 0.810 -.4319632 .3456522

A6 | .2148685 .1721424 1.25 0.240 -.1686887 .5984256

A7 | .114542 .1544704 0.74 0.475 -.2296396 .4587236

A8 | -.1045846 .1490156 -0.70 0.499 -.436612 .2274427

A9 | -.2443856 .1460974 -1.67 0.125 -.5699108 .0811397

A10 | -.1016249 .173713 -0.59 0.572 -.4886817 .2854318

A11 | -.0571411 .2020959 -0.28 0.783 -.5074388 .3931567

A12 | .0085637 .20028 0.04 0.967 -.4376881 .4548154

_cons | 478.7293 18.94364 25.27 0.000 436.5202 520.9383

------------------------------------------------------------------------------

Log-likelihood value = -85.6

Not e: l ags can al so be

cr eat ed i n STAT us i ng t he

command:

. gen A1 = A[ _n- 1]

I I I . C

14

The f ol l owi ng f i gur e shows t he cor r espondi ng OLS es t i mat es of t he

βi

Fi gur e 4

Di s t r i but ed Lag Coef f i ci ent s

( No Cons t r ai nt s )

The es t i mat or vol at i l i t y, l ar ge s t andar d er r or s and smal l t - s t at i s t i cs

f or t he es t i mat ed OLS β' s sugges t a mul t i col l i near i t y pr obl em.

Nei t her t he pat t er n or s i gns f or t he βi' s ar e cons i s t ent wi t h a

r easonabl e expl anat i on of t he i mpact of adver t i s i ng on sal es .

The mos t common appr oach f or deal i ng wi t h t hese pr obl ems i s t o

assume t hat t he βi' s f ol l ow a "r easonabl e" pat t er n whi ch i s descr i bed

by a f ewer number of par amet er s . The associ at ed model i s es t i mat ed

and used i n anal yzi ng t he i mpact of t he var i abl e i n ques t i on.

Cl ear l y, t he advant ages of t hi s appr oach ar e condi t i onal upon t he

accur acy of t he assumpt i ons made about t he βi' s and t hese assumpt i ons

shoul d be t es t ed. The Koyck di s t r i but ed l ag and pol ynomi al

di s t r i but ed l ag model s wi l l be appl i ed.

KOYCK DI STRI BUTED LAGS:

βi

0. 1

0. 2

I I I . C

15

I f t he model bui l der i s wi l l i ng t o assume t hat t he i mpact of t he

i ndependent var i abl e ( adver t i s i ng) on t he dependent var i abl e ( sal es )

decl i nes geomet r i cal l y over t i me, t he Koyck model can pr ovi de a

r easonabl e poss i bi l i t y. I n t hi s model t he coef f i ci ent s ar e assumed t o

be of t he f or m

βi = λi βo i = 1, 2, . . .

Thi s can be vi sual l y r epr esent ed ( f or t wo di f f er ent val ues of λ) as

βi

0. 5

i

λ = 0. 6

λ = 0. 9

I I I . C

16

St = a(1 - λ) + β0At + λSt-1 + εt - λεt-1

The Koyck assumpt i on i mpl i es t hat

2, 1, = i = A

Si

it-

t β∂

∂

= λi βo,

i . e. , a change of one uni t of adver t i s i ng wi l l have an i mmedi at e

i mpact ( β0) on sal es and wi l l cont i nue t o af f ect sal es t her eaf t er , but

at an exponent i al l y decl i ni ng r at e. I n ot her wor ds , sal es wi l l be

i nf l uenced by not onl y cur r ent adver t i s i ng, but al l pas t val ues of

adver t i s i ng.

Rewr i t i ng t he di s t r i but ed l ag model and subs t i t ut i ng f or t he Koyck

coef f i ci ent s yi el ds

St = a + β0At + β1At-1 + β2At-2 + . . . + εt

= a + β0At + λβ1At-1 + λ2β2At-2 + . . . + εt.

Not i ce t hat by assumi ng t hat t he coef f i ci ent s f ol l ow a Koyck model ,

onl y t hr ee coef f i ci ent s ( a, β0 and λ) need be es t i mat ed. Thi s

r epr esent at i on can be wr i t t en i n a f or m whi ch f aci l i t at es es t i mat i on

by r epl aci ng t by t - 1, and mul t i pl yi ng by λ t o yi el d:

( ORI GI NAL) St = a + β0At + λβ0At-1 + λ2β0At-2 + . . . + εt

( MODI FI ED) λSt - 1

= aλ + λβ0At-1 + λ2β0At-2 + . . . + ε

t - 1.

Subt r act i ng t he "modi f i ed r epr esent at i on" f r om t he "or i gi nal

r epr esent at i on" yi el ds

St - λSt-1 = a - aλ + β0At + εt - λεt-1

or equi val ent l y,

.

I I I . C

17

Thi s i s t he f or m we have pr evi ous l y di scussed whi ch can be es t i mat ed

us i ng l eas t squar es wi t h t he St at a commands

Wi t h t he f ol l owi ng St at a out put :


-------------+------------------------------ F( 2, 32) = 77.63

Model | 21128.4531 2 10564.2265 Prob > F = 0.0000


-------------+------------------------------ Adj R-squared = 0.8184

Total | 25483.1429 34 749.504202 Root MSE = 11.666

------------------------------------------------------------------------------

S | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

A | .3732443 .0621284 6.01 0.000 .2466929 .4997957

S1 | .1628443 .128893 1.26 0.216 -.0997022 .4253907

_cons | 407.1455 63.71989 6.39 0.000 277.3523 536.9387

------------------------------------------------------------------------------

( The es t i mat ed i nt er cept i n t hi s model cor r esponds t o a ( 1 - λ ) ;

hence,

a = 407. 145/ ( 1 - . 1628) = 486. 32

The di s t r i but ed l ag coef f i ci ent s can be eas i l y r ecover ed f r om t he

equat i on

β i = β 0 λ i

= ( . 3732) ( . 1628) i;

t her ef or e, t he i mmedi at e i mpact of a one dol l ar i ncr ease i n

adver t i s i ng i s es t i mat ed t o be β 0 = . 3732 wi t h subsequent i ncr eases

i n sal es es t i mat ed t o be ( . 0608, . 0099, . 0016, . 003, 0) f or t he f i r s t

tsset t

gen S1 = S[_n-1])

reg S A S1

I I I . C

18

t hr ough t he f i f t h per i ods . The l ong r un i mpact of a one dol l ar

i ncr ease i n adver t i s i ng i s obt ai ned f r om t he f ol l owi ng:

i mmedi at e: β 0

+ l ag one per i od: β 0 λ

+ l ag t wo per i ods : β 0 λ 2

M cont i nue

Tot al Long Run I mpact

β 0/ ( 1 - λ ) = . 446

Sever al comment s need t o be made. Fi r s t , i t i s ver y i mpor t ant t o t es t

f or aut ocor r el at i on. The l eas t squar es es t i mat or s wi l l be bi ased and

i ncons i s t ent i f t he model cont ai ns l agged dependent var i abl es and

aut ocor r el at ed r andom di s t ur bances . Es t i mat i on t echni ques have been

devel oped whi ch yi el d cons i s t ent es t i mat or s i n t hi s case, but wi l l not

be di scussed her e. Leas t squar es appl i ed t o an equat i on wi t h a l agged

dependent var i abl e and uncor r el at ed er r or s wi l l yi el d bi ased, but

cons i s t ent es t i mat or s . Secondl y, i f i t i s f el t t hat t he assumpt i on t hat

t he i mpact of t he i ndependent var i abl e begi ns decl i ni ng i mmedi at el y i s

t oo r es t r i ct i ve, t hi s can be r el axed. The Koyck pr ocedur e can be

modi f i ed t o cor r espond t o decl i ni ng wei ght s af t er an ar bi t r ar y

t r ans i t i on per i od.

I I I . C

19

POLYNOMI AL DI STRI BUTED LAGS:

As i ndi cat ed ear l i er , pol ynomi al di s t r i but ed l ag model s pr ovi de

one of t he mos t common appr oaches t o di s t r i but ed l ag model s . The

bas i c i dea i s t o appr oxi mat e t he des i r ed f or m f or t he βi' s wi t h a

pol ynomi al whi ch i s descr i bed by a f ewer number of par amet er s t han

t he or i gi nal βi' s i n t he model . I n pr act i ce, p i s r ar el y chosen

t o be l ar ger t han t wo or t hr ee, i . e. , t he βi' s f ol l ow a quadr at i c

or cubi c f or m. As an exampl e, i f p = 2, t he βi' s ar e compl et el y

descr i bed by t hr ee par amet er s ( a0, a1, a2) i n t he equat i on:

βi = a0 + a1i + a2i2.

Consequent l y, t he model

St = a + β0At + β1At-1 + β2At-2 + . . . + βsAt-s + εt

onl y i nvol ves t he par amet er s ( a, a0, a1, a2) r egar dl ess of t he

number of l ags ( s ) i ncl uded i n t he equat i on. Once t he a0, a1, a2

ar e es t i mat ed, t he cor r espondi ng es t i mat es of βi can be obt ai ned

f r om

βi = a0 + a1i + a2i2,

i . e. ,

β0 = a0

β1 = a0 + a1 + a2

β2 = a0 + 2a1 + 4a2 , et c.

I I I . C

20

Al so not e t hat speci f yi ng t he βi' s t o be quadr at i c al l ows

cons i der abl e f l exi bi l i t y.

βi βi

βi βi

Fi gur e 5. Quadr at i c Di s t r i but ed Lags

Stata Example

As an exampl e of es t i mat i ng pol ynomi al di s t r i but ed l ag

coef f i ci ent s , we es t i mat e t he di s t r i but ed l ag i mpact of

adver t i s i ng on sal es us i ng pol ynomi al di s t r i but ed l ags wi t h t he

f ol l owi ng St at a commands ( wher e s=12 and p= 2) :

gen z0 = A + A[_n-1]+A[_n-2]+A[_n-3]+…+A[_n-12]

*index ì' should range up to the order of the polynomial (p)

forvalues i= 1/2

gen zì' = A[_n-1]+A[_n-2]*2^ì'+A[_n-3]*3^ì' …+A[_n-

12]*12^ì’

*regress s on the p+1 transformed variables

reg S z0 z1 z2

*Recover the betas from the coefficients of the zi’s

*(beta0 will be the same as a0, the coefficient of z0)

Scalar b0=_b[z0]

Display b0

forvalues i=1/12

scalar bì' = _b[z0]+_b[z1]*ì'+_b[z2]*ì'^2

display "beta"

display b0

display bì'

I I I . C

21

. reg s z0 z1 z2

Source SS df MS Number of obs = 24

------------------------------------------ F( 3, 20) = 14.37

Model 6688.46 3 2229.49 Prob > F = 0.0000

Residual 3102.04 20 155.10 R-squared = 0.6832

------------------------------------------- Adj R-squared = 0.6356

Total 9790.5 23 425.67 Root MSE = 12.454

------------------------------------------------------------------------------

s Coef. Std. Err. t P>|t|

-----------------------------------------------------------------------------

z0 .2366588 .1137905 2.08 0.051

z1 -.0611558 .0432326 -1.41 0.173

z2 .0032403 .0032659 0.99 0.333

_cons | 484.40 15.95 30.36 0.000

------------------------------------------------------------------------------

. estat ic

-----------------------------------------------------------------------------

Model | Obs ll(null) ll(model) df AIC BIC

----------------------------------------------------------------------------

. | 24 -106.1879 -92.39565 4 192.7913 197.5035

The polynomial distributed lag coefficients can then be obtained from the equation

βi = a0 + a1i + a2i2

=. 2366 - . 0612 i + . 0032 i 2.

The r esul t i ng coef f i ci ent s ar e gi ven bel ow:

βi

0 .237

1 .179

2 .127

3 .082

4 -.044

5 -.012

6 .013

7 -.033

8 -.045

9 -.051

I I I . C

22

10 -.051

11

12

-.044

-.031

The βi' s ( pol ynomi al di s t r i but ed l ag model ) can be i l l us t r at ed as i n

Fi g. 6

βi

. 3

. 2

. 1

0 1 2 3 4 5

Fi gur e 6. Pol ynomi al Di s t r i but ed Lag Coef f i ci ent

The r esul t s f r om t hese t hr ee t echni ques ( OLS, Koyck, PDL) ar e summar i zed

i n Fi gur e 7.

βi

. 4

. 3

. 2

. 1

1 2 3 4 5 6 7 i

Fi gur e 7. Al t er nat i ve Es t i mat es of Di s t r i but ed Lag Ef f ect s

Not e t hat t he di s t r i but ed l ag coef f i ci ent s associ at ed wi t h t he

Koyck and pol ynomi al model s decl i ne- - at di f f er ent r at es . The

pol ynomi al di s t r i but ed l ag model sugges t s t hat t he i mpact of

adver t i s i ng i sn' t s t at i s t i cal l y s i gni f i cant beyond t hr ee or f our

OLS di s t r i but ed

l ag

Koyck di s t r i but ed l ag

pol ynomi al di s t r i but ed l ag

I I I . C

23

mont hs . The es t i mat ed wei ght s f r om t he Koyck model "di e out " even

mor e qui ckl y. Thi s i s i n shar p cont r as t t o t he wei ght s whi ch wer e

es t i mat ed wi t hout any cons t r ai nt s ( OLS) . The advant age of t he

al t er nat i ves t o uncons t r ai ned es t i mat i on shoul d be appar ent . The

r el at ed l i t er at ur e cont ai ns a di scuss i on of many al t er nat i ves . The

met hodol ogy i s s i mi l ar t o t hat al r eady di scussed: ( 1) speci f y a

"f or m f or t he βi' s " whi ch r educes t he number of par amet er s t o be

es t i mat ed; ( 2) t hese new par amet er s ar e t hen es t i mat ed and t he

cor r espondi ng β' s obt ai ned.

The r eader may want t o gai n exper i ence by es t i mat i ng some

al t er nat i ve speci f i cat i ons . I t woul d be i ns t r uct i ve t o cons i der t he

sens i t i vi t y of pol ynomi al di s t r i but ed l ag βi' s t o t he number of l ags ,

degr ee of under l yi ng pol ynomi al as wel l as assumpt i ons about end

poi nt s . The r eader mi ght al so demons t r at e t hat i f we assume t he

ef f ect of adver t i s i ng doesn' t begi n t o decay exponent i al l y unt i l

per i od t wo ( r at her t han i n t he f i r s t per i od) , t he r el evant model can

be wr i t t en as

St = a( 1 - λ) + λSt-1 + β0At + ( β1 - λβ0) At-1 + εt - λεt-1

wher e βi = λi-1 β1 f or i = 1, 2, . . . Es t i mat e t hi s model and compar e

t he r esul t s wi t h t hose obt ai ned us i ng t he Koyck model . The consistency

of the polynomial distributed lag model specification with the unconstrained

estimates can be easily tested using a likelihood ratio test.

I I I . C

24

2. Lagged Dependent Variables - Autoregressive model

Aut or egr ess i ve model s i ncl ude l agged val ues of dependent var i abl es ,

can be vi ewed as bei ng dynami c model s , and l i nk di f f er ent t i me

per i ods . We f i r s t i nt er pr et and summar i ze t he s t at i s t i cal pr oper t i es

of OLS es t i mat or s of aut or egr ess i ve model s . The coef f i ci ent s i n t hese

model s have i mpor t ant "dynami c" i nt er pr et at i ons concer ni ng compar at i ve

s t at i c r esul t s . Fi nal l y, we show t hat t he f amous par t i al and adapt i ve

expect at i ons model s can be expr essed as aut or egr ess i ve model s .

a. I nt er pr et i ng t he coef f i ci ent s i n aut or egr ess i ve model s . A model

i s sai d t o be dynami c i f val ues of t he dependent var i abl e f r om t he

cur r ent and pr evi ous t i me per i ods ar e i ncl uded i n t he same

equat i on. The i ncl us i on of l agged dependent var i abl es pr esent s

sever al pr obl ems t o t he economet r i ci an. I n or der t o di scuss some

of t hese pr obl ems , cons i der t he f ol l owi ng aut or egr ess i ve model :

Yt = α + βI t + γYt-1 + εt

wher e Yt and I t denot e some aggr egat e measur es of pr oduct i on and

i nves t ment .

( 1) Pr oper t i es of es t i mat or s and s t at i s t i cal i nf er ence

I f t he εt' s ar e i ndependent of each ot her ( i . e. , A. 4) , t hen

l eas t squar es es t i mat or s of α, β, γ, ( αs

, β , γ ) wi l l be

bi ased, but cons i s t ent ; wher eas , i f t he εt ar e ser i al l y

cor r el at ed, αs

, β , γ wi l l be bi ased and i ncons i s t ent . I n

nei t her case wi l l t he t and F s t at i s t i cs be appr opr i at e ( mor e

I I I . C

25

on t hi s i n anot her sect i on) . The pr oper t i es of l eas t squar es

es t i mat or s can be compact l y summar i zed as i n t he f ol l owi ng

t abl e:

Pr oper t i es of Leas t Squar es

Res i dual s

Uncor r el at ed Cor r el at ed

No Lagged Dependent

Var i abl e

unbi ased

cons i s t ent

ef f i ci ent

unbi ased

cons i s t ent

not ef f i ci ent Lagged Dependent

Var i abl e

bi ased

cons i s t ent

not ef f i ci ent

bi ased

i ncons i s t ent

not ef f i ci ent

Thus i t i s i mpor t ant t o t es t f or aut ocor r el at i on. The D. W.

can be used f or model s wi t hout l agged dependent var i abl es and

Dur bi n' s h t es t or Br eusch- Godf r ey t es t can be used f or

aut or egr ess i ve model s . ( See t he di scuss i on of aut ocor r el at i on

i n sect i on I V of t he not es . )

( 2) I nt er pr et at i on of coef f i ci ent s

For not at i onal s i mpl i ci t y del et e εt f r om t he pr evi ous equat i on

and cons i der

Yt = α + βI t + γYt-1

i s r ef er r ed t o as t he i mpact mul t i pl i er

f or t hi s model and i s not what i s

gener al l y r ef er r ed t o as "t he i nves t ment

mul t i pl i er . " The i mpact mul t i pl i er

β∂

∂ =

I

Y

t

t

I I I . C

26

measur es t he change i n Yt dur i ng t he same

per i od as I t changes .

We not e t hat s i nce

Yt = α + βI t + γYt-1

i t f ol l ows t hat

Yt-1 = α + βI t-1 + γYt-2;

hence,

Yt = α + βI t + γ( α + βI t-1 + γYt-2)

= α( 1 + γ) + β[ I t + γI t-1] + γ2Yt-2.

Cont i nui ng t hi s pr ocess we obt ai n

Yt = α( 1 + γ + γ2 + . . . ) + β[ I t + γI t-1 +γ2I t-2 + . . . ] .

What wi l l t he t ot al ef f ect of a change i n I t have on Yt, Yt+1, .

. . ,

when ∆I t = 1 ∆Yt = β

∆Yt+1 =βγ

∆Yt+2 = βγ2

M

Tot al i mpact =γ

βγγβ

- 1 = ...) + + + (1

2

The t wo per i od cumul at i ve mul t i pl i er i s gi ven by β + βγ,

t he t hr ee per i od by β + βγ + βγ2 and so on.

The l ong r un i nves t ment mul t i pl i er i s gi ven byγ

β

- 1 . The

l ong- r un mul t i pl i er can be i nt er pr et ed i n t wo ways : ( 1) t he

cumul at i ve ( over t i me) change i n Y cor r espondi ng t o a one t i me

= ... + I ... + I + I + I + - 1

3t-3

2t-2

1t-t γβγββγβγ

α

I I I . C

27

i ncr ease i n i nves t ment expendi t ur e; or ( 2) t he i ncr ease i n

l ong- r un equi l i br i um Y cor r espondi ng t o a sus t ai ned i ncr ease

i n i nves t ment expendi t ur e. These t wo i nt er pr et at i ons ar e

r epr esent ed i n t he f ol l owi ng f i gur e.

I I I . C

28

I mpact of change i n i nves t ment

One per i od change Sus t ai ned change

Yt

Yt

I

I

t t

b. Some common aut or egr ess i ve model s

( 1) Par t i al adj us t ment model

Opt i mal : The opt i mal val ue of yt, yt*, i s a f unct i on of xt

yt* = α + βxt tu+

Adj us t ment mechani sm:

yt - yt-1 = γ( yt* - yt-1) 0 < γ ≤ 1

Not e: ( 1) γ = 1 cor r esponds t o compl et e adj us t ment .

( 2) Thi s adj us t ment mechani sm i s cons i s t ent wi t h t he

mi ni mi zat i on of cos t s , c t, wher e

c t = α( yt - yt*) 2 + β( yt - yt-1)2

cos t s : out of equi l i br i um change

wher e yt-1 and yt* ar e gi ven.

∆I =1 ∆I =1

tY1

β∆ =

− γ

tY1

β∆ =

− γ

I I I . C

29

yt = αγ + βγxt + (1 - γ)yt-1 + γt

u

Combi ni ng t he bas i c equat i on and adj us t ment mechani sm yi el ds

whi ch can be es t i mat ed us i ng OLS.

( 2) Adapt i ve Expect at i ons Model . Thi s model r el axes t he

assumpt i on t hat t he dependent var i abl e depends onl y on t he

cur r ent l evel of t he i ndependent var i abl e. Let xt* denot e t he

"expect ed" l evel of xt and assume t he dependent var i abl e

i mmedi at el y adj us t s t o xt*.

Bas i c Rel at i onshi p:

yt = α + β xt* + ut

Adj us t ment Mechani sm:

xt* - xt-1

* = δ( xt - xt-1*) 0 < δ ≤ 1

δ = 1 cor r esponds t o compl et e adj us t ment .

Combi ni ng t hese expr ess i ons yi el ds

Not e t he s i mi l ar i t y and di f f er ences bet ween t he f or ms f or t he

Koyck, par t i al adj us t ment , and adapt i ve expect at i ons model s .

yt = αδ + βδxt + (1 - δ)yt-1 + (ut - (1 - δ)ut-1)

I I I . C

30

( 3) Par t i al Adj us t ment and Adapt i ve Expect at i ons Model

Bas i c Rel at i onshi p: yt* = α + β xt

*

opt i mal expect ed

Adj us t ment Mechani sms:

yt - yt-1 = γ( yt* - yt-1) + ut 0 < γ ≤ 1

xt* - xt-1

* = δ( xt - xt-1*) 0 < δ ≤ 1

Combi ni ng t hese expr ess i ons yi el ds

c. Es t i mat i on of Aut or egr ess i ve model s


yt = β1 + β2yt-1 + β3xt + εt

wi t h t he f ol l owi ng assumpt i ons f or t he er r or t er m.

Assumpt i on I . εt ~ NI D( 0, σ2) wher e NI D s t ands f or

i ndependent l y and i dent i cal l y di s t r i but ed as

N( 0, σ2) .

Assumpt i on I I . εt = ut - λut-1 Koyck

a. ut ~ NI D ( 0, σ2u)

b. ut = ρut-1 + ηt ρ < 1

ηt ~ NI D( 0, σ2η)

Assumpt i on I I I . εt = ρεt-1 + ut ut ~ NI D( 0, σ2u)

yt = αγδ + βγδxt + [(1 - δ) + (1 - γ)]yt-1

- (1 - δ)(1 - γ)yt-2 + (ut - (1 - δ)ut-1)

I I I . C

31

( 1) Assumpt i on I . l eas t squar e es t i mat or s of β = ( β1, β2, β3)

wi l l be bi ased, but cons i s t ent .

( a) Remember t hat OLS es t i mat or s ar e unbi ased and cons i s t ent

i n t he pr esence of aut ocor r el at i on, but ar e no l onger

mi ni mum var i ance es t i mat or s .

( b) The pr esence of l agged dependent var i abl es r esul t s i n

l eas t squar es es t i mat or s whi ch ar e bi ased, but ar e s t i l l

cons i s t ent .

( c) The pr esence of aut ocor r el at i on and l agged dependent

var i abl es i mpl i es t hat l eas t squar es es t i mat or s wi l l be

bi ased and i ncons i s t ent . Thi s s i t uat i on ar i ses wi t h

assumpt i on I I and I I I . Hence, es t i mat or s ot her t han l eas t

squar es es t i mat or s need t o be devel oped f or t he case of

l agged dependent var i abl es and aut ocor r el at i on.

( d) The i ncl us i on of l agged dependent var i abl es bi ases t he

val ue of t he Dur bi n Wat son s t at i s t i c t owar ds 2 and

t her ef or e t he s t andar d i nt er pr et at i on of D. W. i s not

val i d.

The h- t es t has been pr oposed as a t es t f or aut ocor r el a-

t i on i n t hi s case

ρ

)y of .est .(Coefar Vn - 1

n =h

1t-

2

1

The asympt ot i c di s t r i but i on of h i s

h ~ N( 0, 1) .

Ther e ar e t wo mai n pr obl ems wi t h t hi s t es t :

( i ) The h t es t i s not val i d i f n V ar ( ) > 1

( i i ) N( 0, 1) seems t o be a yi el d a poor f i t t o t he

di s t r i but i on of h f or f r equent l y encount er ed sampl e

s i zes . Some have ar gued t hat t he use of du and 4- du

t o def i ne cr i t i cal r egi ons appear s t o pr ovi de mor e

accur at e r esul t s . Du cor r esponds t o t he upper l i mi t

I I I . C

32

( ) 1ttt211tt Y1CC −− ε−ε+γβ+λ−β=λ−

f or a Dur bi n Wat son Tes t St at i s t i c whi ch wi l l be

di scussed l at t er .

_______________________________________

du 2 4- du

Ot her t es t s f or t he pr esence of aut ocor r el at i on i n a model wi t h

l agged dependent var i abl es ar e avai l abl e. For exampl e, t he

Br eusch- Godf r ey and Lj ung- Box t es t s can be modi f i ed t o appl y t o

aut or egr ess i ve model s . The Br eusch- Godf r ey t es t can be appl i ed by

r egr ess i ng t he OLS t' on the lagged y's and the lagged e 't

e s s i mpl i ed by

t he model ( aut or egr ess i ve and number of aut or egr ess i on or movi ng

aver age er r or s ) and t es t i ng f or t he col l ect i ve expl anat or y power

of t he coef f i ci ent s of t he l agged er r or s us i ng an F- t es t .

A br i ef t r eat ment of es t i mat i on i n t he case of I I or I I I i s

r epor t ed i n t he appendi x.

I I I . C

33

D. Causality or Exogeniety

The exi s t ence of a r el at i onshi p does not i mpl y t hat ei t her var i abl e

causes t he ot her var i abl e. Ther e i s an ext ens i ve l i t er at ur e on what i t

means f or X t o cause Y or f or X t o be exogenous t o Y. A r el at ed concept

i s Gr anger causal i t y. X i s sai d t o not Gr anger - cause Y i f t he

condi t i onal di s t r i but i on of Y, gi ven l agged Y and l agged X i s equal t o

t he condi t i onal di s t r i but i on of Y, gi ven l agged Y. Al t er nat i vel y,

l agged X’s do not hel p expl ai n cur r ent l evel s of Y. A t es t of whet her X

Gr anger - causes Y can be per f or med as f ol l ows:

( 1) Es t i mat e t he f ol l owi ng model :

1 1 1 1... ...t t p t p t p t p t

y a b y b y c x c x ε− − − −= + + + + + + + .

( 2) Tes t t he j oi nt hypot hes i s , 0 1: ... 0p

H c c= = = ( X does not

Gr anger - cause Y) us i ng an F t es t . A “l ar ge” F s t at i s t i c pr ovi des

evi dence t hat X Gr anger - causes Y.

I I I . C

34

APPENDIX-- PDL MODELS

1. "A Few Details for the Almon Distributed Lag."

Cons i der t he pr obl em of es t i mat i ng an Al mon di s t r i but ed l ag model wi t h p =

2 and s = 3 so we have a 2nd degr ee pol ynomi al wi t h 3 l ags . The βi' s can be

expr essed i n t er ms of t he a i' s ( r ecal l : βj = a0 + a1i + a2i2) as

β0 = a0

β1 = a0 + a1 + a2

β2 = a0 + 2a1 + 4a2

β3 = a0 + 3a1 + 9a2 .

Subs t i t ut i ng t hese expr ess i ons i nt o t he or i gi nal di s t r i but ed l ag model f or βi

yi el ds :

yt = α + a0xt + ( a0 + a1 + a2) xt-1 + ( a0 + 2a1 + 4a2) xt-2 + ( a0 +

3a1 + 9a2) xt-3

= α + a0( xt + xt-1 + xt-2 + xt-3)

+ a1( xt-1 + 2xt-2 + 3xt-3)

+ a2( xt-1 + 4xt-2 + 9xt-3) + ut

For a mor e gener al case, assume p = 3 and s = 10.

s = 10: yt = δ + βoxt + β1xt-1 + . . + β10xt-10 + ut

p = 3: βi = a0 + a1i + a2i2 + a3i

3

β0 = a0

β1 = a0 + a1 + a2 + a3 = Σa i

β2 = a0 + a12 + a222 + a32

3 = Σa i2i

M β10 = a0 + a110 + a2102 + a3103 = Σa i10i

Agai n, af t er subs t i t ut i ng f or βi, we obt ai n

yt = δ + a0xt + ( Σa i) xt-1 + ( Σa i2i) xt-2 + . . .

+ ( Σa i10i) xt-10 + ut.

I I I . C

35

Rear r angi ng t er ms we obt ai n

yt = δ + a0( xt + xt-1 + . . . + xt-10)

+ a1( xt-1 + 2xt-2 + . . . + 10xt-10)

+ a2( xt-1 + 22xt-2 + . . . + 102xt-10)

+ a3( xt-1 + 23xt-2 + . . + 103xt-10) +ut

δ ∑∑ ix a + x a + = y it-

10

1=i

1it-

10

0=i

0t

u + xi a + xi a + tit-3

10

1=i

3it-2

10

0=i

2

∑∑

Def i ni ng )xi( = z it-j

10

0=i

tj ∑ we can es t i mat e t he a i, ( t he βi) by obt ai ni ng es t i mat es of

yt = δ + a0z t0 + a1z t1 + a2z t2 + a3z t3 + ut

)ZZ( =

a

.

.

.

a

ˆ

Var1-2

3

0

u ′σ

δ

Now s i nce

δ

β

β

β

δ

a

a

a

a

101010100

.....

.....

.....

33330

22220

11110

00001

=

.

.

.

3

2

1

0

3210

3210

3210

10

1

0

I I I . C

36

δ

a

.

.

.

a

C =

3

0

C )ZZC( =

ˆ

.

.

.

ˆ

ˆ

ˆ

Var then 1-2

10

1

0

u ′′σ

β

β

β

δ

I I I . C

37

PROBLEM SET 4.3: LAGGED VARIABLES

Applied problems

1. Repl i cat e t he r esul t s i n t he appl i cat i ons of OLS, Koyck, and PDL model s

t o es t i mat e t he r el at i onshi p bet ween sal es and adver t i s i ng expendi t ur es r epor t ed i n not es . The dat a ar e avai l abl e i n f i l e hw3_3_table1.txt).

I n par t i cul ar ,

( a) es t i mat e

St = a + β0At + . . . +β0At-12+ εt

us i ng ( 1) OLS

( 2) Koyck Lags ( r epor t λ, α, β0) ( 3) Pol ynomi al di s t r i but ed l ags , or der = 2

( b) Compar e t he di s t r i but ed l ag coef f i ci ent s wi t h OLS.

( c) Tes t t he PDL speci f i cat i on agai ns t t he OLS us i ng a Chow and LR

t es t .

( d) Re- es t i mat e t he model us i ng a pol ynomi al di s t r i but ed l ag wi t h

or der = 3 and t es t whet her t he di f f er ences bet ween p=2 and p=3 ar e

s t at i s t i cal l y s i gni f i cant .

( e) ( Bonus) Es t i mat e a modi f i ed Koyck model whi ch decl i nes

geomet r i cal l y af t er t he f i r s t l ag.

Hi nt : r epl i cat e t he commands cont ai ned i n t he PDL sect i on of t he cl ass not es .

The TA wi l l be a gr eat r esour ce.

I I I . C

38

(JM III-C)

Table 1

Sales and Advertising

t St At At-1 At-2 At-3 At-4 At-12

1 521 73

2 515 94 73

3 533 88 94 73

4 531 103 88 94 73

5 544 104 103 88 94 73

6 528 73 104 103 88 94

7 537 121 73 104 103 88

8 541 134 121 73 104 103

9 531 102 134 121 73 104

10 535 79 102 134 121 73

11 527 119 79 102 134 121

12 517 118 119 79 102 134

13 547 145 118 119 79 102 73

14 560 128 145 118 119 79 94

15 557 145 128 145 118 119 88

16 548 191 145 128 145 118 103

17 543 159 191 145 128 145 104

18 580 169 159 191 145 128 73

19 564 162 169 159 191 145 121

20 581 181 162 169 159 191 134

21 557 170 181 162 169 159 102

22 575 183 170 181 162 169 79

23 585 205 183 170 181 162 119

24 568 185 205 183 170 181 118

25 569 200 185 205 183 170 145

26 551 173 200 185 205 183 128

27 586 243 173 200 185 205 145

28 581 215 243 173 200 185 191

29 559 210 215 243 173 200 159

30 594 229 210 215 243 173 169

31 593 227 229 210 215 243 162

32 579 249 227 229 210 215 181

33 609 265 249 227 229 210 170

34 602 257 265 249 227 229 183

35 617 253 257 265 249 227 205

36 601 239 253 257 265 249 185

I I I . C

39

2. I n Exampl e 11. 4 ( Wooldridge p.389) i t may be expect ed t hat t he expect ed val ue of t he r et ur n at t i me t, i t a quadr at i c f unct i on of returnt-1. To check t hi s poss i bi l i t y, use t he dat a i n NYSE.RAW t o es t i mat e

returnt = β0 + β1returnt-1 + β2return

2t-1 + u

( a) r epor t t he r esul t s i n s t andar d f or m ( b) St at e and t es t t he nul l hypot hes i s t hat E( returnt|returnt-1) does not

depend on returnt-1. ( Hi nt : Ther e ar e t wo r es t r i ct i ons t o t es t her e. ) ( c) Dr op return

2t-1 f r om t he model , but add t he i nt er act i on t er m

returnt-1returnt-2. Now t es t t he ef f i ci ent mar ket s hypot hes i s ( β1= β2 = 0) . ( d) What do you concl ude about pr edi ct i ng weekl y s t ock r et ur ns based on

pas t s t ock r et ur ns? (Wooldridge C. 11.3)

1I V


V. Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

A. Introductory Comments, B. Nonnormality of errors, C. Nonzero mean of errors, D.

Generalized Regression Model, E. Heteroskedasticity, F. Autocorrelation, G. Panel Data, H.

Stochastic X’s, I. Measurement Error, J. Specification Error

A. Introductory Comments

The Cl ass i cal Nor mal Li near Regr ess i on Model i s def i ned by:

y = Xβ + ε

wher e ( A. 1) ε i s di s t r i but ed nor mal l y

( A. 2) E( εt

) = 0 f or al l t

( A. 3) Var ( εt

) = σ2 f or al l t

( A. 4) Cov ( εtεs

) = 0 f or t ≠ s

( A. 5) The X' s ar e nons t ochas t i c and( )

n

XXlimn

′

∞→ i s nons i ngul ar ,

ΣX.

Recal l t hat assumpt i ons ( A. 1) - ( A. 4) can be wr i t t en mor e compact l y as

ε ~ N[ 0, Σ = σ2I ] .

I n sect i on ( I I ' ) we demons t r at ed t hat under assumpt i ons ( A. 1) - ( A. 5) t he

l eas t squar es es t i mat or ( β ) , t he maxi mum l i kel i hood es t i mat or (∆

β ) , and

t he bes t l i near unbi ased es t i mat or ( β~

) ar e i dent i cal , i . e. ,

β = β~

=∆

β = ( X' X)- 1

X' y and

β ~ N[ β; σ2( X' X)- 1

] .

Addi t i onal l y, we pr oved t hat t he l eas t squar es es t i mat or β ( hence β~

and∆

β ) ar e

•unbi ased es t i mat or s

2I V

•mi ni mum var i ance of al l unbi ased es t i mat or s

3I V

•cons i s t ent

•asympt ot i cal l y ef f i ci ent .

I n t hi s sect i on we wi l l demons t r at e t hat t he s t at i s t i cal pr oper t i es of

β ar e cr uci al l y dependent upon t he val i di t y of assumpt i ons ( A. 1) - ( A. 5) .

The associ at ed di scuss i on wi l l pr oceed by dr oppi ng one assumpt i on at a

t i me and cons i der i ng t he consequences . Fi r s t , we wi l l dr op ( A. 1) and t hen

( A. 2) . Thi s wi l l be f ol l owed by cons i der i ng t he gener al i zed r egr ess i on

model whi ch can be vi ewed as a gener al i zed model whi ch i ncl udes

het er oskedas t i ci t y ( vi ol at i on of ( A. 3) ) , aut ocor r el at i on ( vi ol at i on of

( A. 4) ) , and t he cl ass i cal nor mal l i near r egr ess i on model as speci al cases .

I n Sect i ons G, H, and I we wi l l cons i der t he i mpl i cat i ons of vi ol at i ng

( A. 5) , t he exi s t ence of measur ement er r or , and pr esence of speci f i cat i on

er r or ( guess i ng t he wr ong model ) .

B. The Random Disturbances are not distributed normally, but (A.2)-(A.5) are valid.

An i nspect i on of t he der i vat i on of t he l eas t squar es es t i mat or β

r eveal s t hat t he deduct i on i s i ndependent of any of t he assumpt i ons

( A. 1) - ( A. 5) ; hence,

β = ( X' X)- 1

X' y

i s s t i l l t he cor r ect f or mul a f or t he l eas t squar es es t i mat or of β i n t he

model

y= Xβ + ε

r egar dl ess of t he assumpt i ons about t he di s t r i but i on of ε. However , i t

shoul d be ment i oned t hat t he s t at i s t i cal pr oper t i es of β ar e ver y

sens i t i ve t o t he assumpt i ons about t he di s t r i but i on of ε.

Si mi l ar l y, we not e t hat t he BLUE of β i s i nvar i ant wi t h r espect t o

t he assumpt i ons about t he under l yi ng pr obabi l i t y dens i t y f unct i on of ε as

l ong as ( A. 2) - ( A. 5) ar e val i d. I n t hi s case we can concl ude t hat

β = β~

= ( X' X)- 1

X' y

4I V

and bot h β and β~

wi l l be

• unbi ased

• mi ni mum var i ance of al l l i near unbi ased es t i mat or s

( not necessar i l y of al l unbi ased es t i mat or s s i nce t he Cr amer Rao

l ower bound depends upon dens i t y of t he r es i dual s )

• cons i s t ent

• s t andar d t and F t es t s and conf i dence i nt er val s ar e not necessar i l y

val i d f or nonnor mal l y di s t r i but ed r es i dual s .

The di s t r i but i on of β wi l l depend on t he di s t r i but i on of ε whi ch

det er mi nes t he di s t r i but i on of y ( y = Xβ + ε) and t he di s t r i but i on of β

and β~

( β = β~

= ( X' X)- 1

X' y ) .

Let ' s cons i der t he MLE of β. Recal l t hat t he f i r s t s t ep i n t he

der i vat i on of MLE of β i s t o def i ne t he l i kel i hood f unct i on, f or

i ndependent and i dent i cal l y di s t r i but ed obser vat i ons ,

L = f ( y1

; β) . . . f ( yn

; β)

whi ch r equi r es a knowl edge of t he di s t r i but i on of t he r andom di s t ur bances

and coul d not be def i ned ot her wi se. MLE ar e gener al l y ef f i ci ent . Leas t

squar es es t i mat or s wi l l be ef f i ci ent i f f ( y; ) = nor mal . However , l eas t

squar es need not be ef f i ci ent i f t he r es i dual s ar e not di s t r i but ed

nor mal l y. For exampl e, i f ε i s di s t r i but ed as a Lapl ace wi t h A. 2- A. 5

hol di ng, OLS wi l l be cons i s t ent and BLUE, but not ef f i ci ent .

Cons i der t he case i n whi ch t he dens i t y f unct i on of t he r andom

di s t ur bances i s t he Lapl ace or doubl e exponent i al def i ned by

( )-| |/

ef ; - < <

2

ε λ ε σ = ∞ ε ∞

λ

whi ch can be gr aphi cal l y depi ct ed as

5I V

f ( εt

)

Thi s dens i t y has t hi cker t ai l s t han t he nor mal and i s mor e peaked at 0.

The associ at ed l i kel i hood f unct i on i s def i ned by

L = f ( y1; β, λ ) . . . f ( yn

; β, λ )

1 n1 n-| - |/ -| - |/y yX Xe e= . . .

2 2

β λ β λ

λ λ

wher e Xt

= ( 1, xt 2

, . . . , xt k

) , β' = ( β1

, . . . , βk

) . The l og

l i kel i hood f unct i on i s gi ven by

tt1

= lnL = - | - | / - nln(2 ).Xn

t

y β λ λ=

∑l

6I V

The MLE of β i n t hi s case wi l l mi ni mi ze t he sum of t he absol ut e val ue of

t he er r or s

tt

t

| - |Xy β∑

and i s somet i mes cal l ed t he "l eas t l i nes , " mi ni mum absol ut e devi at i ons

( MAD) , l eas t absol ut e devi at i on ( LAD) , or l eas t absol ut e er r or ( LAE)

es t i mat or ; wher eas , t he l eas t squar es es t i mat or of β mi ni mi zes t he sum of

squar ed er r or s

( )2

t t

t

y X β−∑

and wi l l not be t he MLE es t i mat or ∆

β i n t hi s case. For t he l i near

r egr ess i on model wi t h Lapl ace er r or t er ms ∆

β ( LAD) wi l l be unbi ased,

cons i s t ent , and asympt ot i cal l y ef f i ci ent . The f ol l owi ng t abl e compar es

and cont r as t s t he r el at i ve per f or mance of OLS and LAD es t i mat or s f or t he

t wo di f f er ent er r or di s t r i but i ons , t he nor mal and Lapl ace.

Var i ance- covar i ance mat r i ces of t he OLS and LAD es t i mat or s

Es t i mat or \ er r or

di s t r i but i on

Nor mal Lapl ace

OLS ( )12 'X Xσ

− ( )

12 'X Xσ−

LAD ( )122 'X Xσ

−

( )2

1'

2X X

σ −

Fr om t hi s t abl e we can see t hat t he var i ance of LAD es t i mat or s i s t wi ce

t hat of t he cor r espondi ng OLS es t i mat or s f or nor mal er r or s , but i s hal f

t he OLS var i ance f or Lapl ace er r or s . Recal l t hat t he Lapl ace pdf has

t hi cker t ai l s t han t he nor mal ; hence, t he pr esence of out l i er s LAD may be

pr ef er r ed t o OLS. LAD es t i mat or s can be obt ai ned us i ng t he St at a command

qreg y X’s

7I V

The exer ci se set cons i der s a gener al i zed er r or ( GED) di s t r i but i on

whi ch i ncl udes bot h t he nor mal and doubl e exponent i al or Lapl ace as

speci al cases . Consequent l y, l eas t squar es and LAD es t i mat or s ar e speci al

cases of MLE of t he GED di s t r i but i on.

I n t he pas t , t he f unct i onal f or m of t he di s t r i but i on of t he r es i dual s

has r ar el y been i nves t i gat ed. Thi s i s changi ng and coul d be i nves t i gat ed

by compar i ng t he di s t r i but i on of εt

wi t h t he nor mal .

Var i ous t es t s have been pr oposed t o i nves t i gat e t he val i di t y of t he

nor mal i t y assumpt i on. These t es t s t ake di f f er ent f or ms . One cl ass of

t es t s i s based on exami ni ng t he skewness or kur t os i s of t he di s t r i but i on

of t he es t i mat ed r es i dual s .

The skewness coef f i ci ent

3

1 3/ 22

E( ) =

( )

εγ

σ

whi ch can be es t i mat ed by

3

1

1 3/ 2

2

1

/

ˆ

/

n

t

t

n

t

t

n

n

ε

γ

ε

=

=

=

∑

∑

and has an asympt ot i c di s t r i but i on

N( 0, 6/ n) .

Si mi l ar l y, t he excess kur t os i s coef f i ci ent

4

2 22

E( ) = - 3

( )

εγ

σ

can be es t i mat ed by

4t

t

2 22t

t

/e

ˆ - 3( )e /

n

nγ

=∑

∑

8I V

and has an asympt ot i c di s t r i but i on

N( 0, 24/ n)

f or nor mal l y di s t r i but ed r es i dual s . These t wo r esul t s pr ovi de t he bas i s

f or cons t r uct i ng “t - t ype” t es t s t o t es t whet her t he sampl e skewness or

kur t os i s ar e cons i s t ent wi t h t he assumpt i on of nor mal l y di s t r i but ed

r es i dual s .

The Jar que- Ber a t es t pr ovi des a j oi nt t es t of a symmet r i c di s t r i but i on

f or t he r es i dual wi t h kur t os i s of t hr ee. The t es t s t at i s t i c i s def i ned by

( )2

2 excess kurtosisskewnessJB = n +

6 24

and has an asympt ot i c Chi squar e di s t r i but i on wi t h t wo degr ees of f r eedom.

The di s t r i but i on of JB f ol l ows f r om i t bei ng equal t o t he sum of squar es

of t wo asympt ot i cal l y i ndependent s t andar d nor mal var i abl es .

Chi - squar e goodness of f i t t es t s have al so been pr oposed whi ch ar e

based upon compar i ng t he hi s t ogr am of es t i mat ed r es i dual s wi t h t he nor mal

di s t r i but i on.

These t es t s t at i s t i cs and ot her s ar e avai l abl e out put on such pr ogr ams

as St at a, SAS, or SHAZAM. The St at a commands ar e gi ven bel ow.

To t es t f or s t at i s t i cal l y s i gni f i cant depar t ur es of skewness and

kur t os i s f r om t he nor mal , t he commands ar e: reg y X’s

predict resid, res

sum resid, detail

sktest resid

The out put f r om t he sktest e al so i ncl udes t he cal cul at i on of a

Jar que- Ber a- l i ke t es t , al ong wi t h t he associ at ed p- val ues . The

exact t es t s t at i s t i cs di f f er f r om t hose out l i ned above, but ar e

s i mi l ar i n s t r uct ur e and t es t s t he same hypot heses . (D’Agostino, Belander, and D’Agostino, American Statistician, 1990, pp. 316-321)

To per f or m a Chi - squar e t es t i n St at a, you mus t f i r s t i ns t al l t he

“csgof ” command by t ypi ng

findit csgof

and t hen i ns t al l i ng t he command and hel p f i l es .

9I V

The Kol mogor ov- Smi r nov t es t i s based upon t he di s t r i but i on of t he

maxi mum ver t i cal di s t ance bet ween t he cumul at i ve hi s t ogr am and t he

cumul at i ve di s t r i but i on of t he hypot hes i zed di s t r i but i on. James Ramsey' s

pr ogr am SEA ( Speci f i cat i on Er r or Anal ys i s ) enabl es one t o per f or m such a

t es t . Thi s can al so be per f or med i n St at a us i ng t he command “ksmi r nov”.

An al t er nat i ve appr oach i s t o cons i der gener al di s t r i but i on f unct i ons

whi ch i ncl ude many of t he common al t er nat i ve speci f i cat i ons such as t he

nor mal as speci al cases . The f i r s t pr obl em i n t he pr obl em set i l l us t r at es

t hi s appr oach. Fi ve ot her di s t r i but i ons whi ch mi ght al so be cons i der ed

ar e t he gener al i zed t , skewed gener al i zed t , t , EGB2, and I nver se

Hyper bol i c Si ne di s t r i but i ons . Es t i mat i on pr ocedur es exi s t whi ch per f or m

wel l f or non- nor mal di s t r i but i ons . Some of t hese ar e r ef er r ed t o as

r obus t , M, semi par amet r i c, or par t i al l y adapt i ve es t i mat or s whi ch

accommodat e ver y f l exi bl e under l yi ng di s t r i but i ons . Ker nel es t i mat or s

pr ovi de anot her appr oach t o t hi s pr obl em whi ch ar e nonpar amet r i c i n t hat

t hey ar e i ndependent of a di s t r i but i onal assumpt i on. Us i ng some of t hese

al t er nat i ve es t i mat or s , t he hypot hes i s of nor mal l y di s t r i but ed r es i dual s

can al so be t es t ed us i ng t he LR, Wal d, or Rao or Lagr angi an mul t i pl i er

t es t s .

10I V

C. ε ~ N (µ, ΣΣΣΣ = σ2 I), i.e., drop (A.2)

The l eas t squar es es t i mat or s of β i s gi ven by

β = ( X' X)- 1

X' y

The expect ed val ue of β i s gi ven as f ol l ows

E( β ) = ( X' X)- 1

X' E( y)

= ( X' X)- 1

X' ( Xβ + E( ε) )

= ( X' X)- 1

X' Xβ + ( X' X)- 1

X' µ

= β + ( X' X)- 1

X' µ

wi t h t he second t er m r epr esent i ng t he bi as , whi ch appear s t o sugges t t hat

al l of t he l eas t squar es es t i mat or s i n t he vect or β ar e bi ased.

However , i f E( εt

) = µ f or al l t , t hen

1

..

= = . .

..

1

µ

µ µ

µ

and i t can be shown t hat

( X' X)- 1

X' µ = ( X' X)- 1

X'

µ

µ

0

0 =

1

.

.

.

1

11I V

and onl y t he es t i mat or of t he i nt er cept i s bi ased. I f an er r or

di s t r i but i on has a nonzer o mean, t hi s get s i ncl uded i n t he i nt er cept t er m

and separ at e es t i mat es of β1

and µ can' t be obt ai ned.

Mor e gener al vi ol at i ons of ( A. 2) such as a nonzer o, non- cons t ant mean can

l ead t o bi ased es t i mat or s of t he i nt er cept and s l ope coef f i ci ent s .

β1

+ β2

Xt

µ

12I V

D. Generalized Normal Linear Regression Model

1. Introduction

I n many economi c appl i cat i ons ei t her ( A. 3) or ( A. 4) i s vi ol at ed, i . e. ,

Het er oskedas t i ci t y: Var ( εt

) ≠ σ2 f or al l t

Aut ocor r el at i on: Cov ( εt

, εs

) ≠ 0 f or t ≠ s

For s i t uat i ons i n whi ch ei t her or bot h aut ocor r el at i on and

het er oskedas t i ci t y exi s t s

Var ( ε) = Σ ≠ σ2 I ,

13I V

t he model can be wr i t t en mor e gener al l y as

y = Xβ + ε

( A. 1) - ( A. 4) ε ~ N( 0, Σ)

( A. 5) Same as bef or e

Thi s model i s r ef er r ed t o as t he gener al i zed nor mal l i near r egr ess i on

model and i ncl udes t he cl ass i cal nor mal l i near r egr ess i on model as a

speci al case, i . e. , when

Σ = σ2I .

The unknown par amet er s i n t he gener al i zed r egr ess i on model ar e t he

1 k

n(n - 1)'s = ( , ..., ) and the n(n+1) / 2 = n +

2

β β β

i ndependent par amet er s i n t he symmet r i c mat r i x Σ. I n gener al i t i s not

poss i bl e t o es t i mat e Σ unl ess some s i mpl i f yi ng assumpt i ons ar e made.

For exampl e, wi t h t he case of het er oskedas t i ci t y al one

ε

ε

=Σ

)(Var0

0)(Var

n

1

O

or f or aut ocor r el at i on al one

σ

εεεεσ

∑

2

n1212

),Cov(...)Cov(

= MO

and f or t he cl ass i cal nor mal l i near r egr ess i on model

14I V

σ

σ

=Σ2

2

0

0

O

2. Estimators of β

a. Leas t squar es es t i mat i on

SSE = ( y- Xβ) ' ( y- Xβ)

= y' y - 2β' X' y + β' X' Xβ

SSE

= 2X y + 2X X ∂

′ ′ β∂β

Set t i ng t hi s der i vat i ve equal t o zer o and sol vi ng yi el ds :

ˆ2 ' 2 'X y X X β=

β = ( X' X) -1 X' y

b. Maxi mum l i kel i hood es t i mat i on

-1-(1/ 2)(y-X ) ' (y-X )

n/ 2 1/ 2

eL(y; ) =

(2 |) |

β β∑β

π ∑

l = l nL = ( - n/ 2) l n ( 2π) - 1/ 2 l n Σ - 1/ 2 ( y- Xβ) ' Σ- 1

( y - Xβ)

= ( - n/ 2) l n ( 2π) - 1/ 2 l n Σ - 1/ 2 ( y' Σ- 1

y - 2β' X' Σ- 1

y +

β' X' Σ- 1

Xβ)

-1 -1d = (-1/ 2)(-2 X' y + 2X' X )

dβ

β∑ ∑

l

Set t i ng t hi s der i vat i ve equal t o 0 and sol vi ng i mpl i es

-1 -1(X X) = X y∆

′ ′β∑ ∑

whi ch ar e r ef er r ed t o as t he modi f i ed nor mal equat i ons . The sol ut i on

of t hese equat i ons

-1-1 -1 = (X X X y)∆

′ ′β ∑ ∑

15I V

i s t he maxi mum l i kel i hood es t i mat or of β.

16I V c. Bes t l i near unbi ased es t i mat or

Li near i t y condi t i on: β~

= Ay wher e A i s a k x n mat r i x of

unknown cons t ant s .

Unbi ased condi t i on: Sel ect A so t hat

E( β~

) = β, whi ch r equi r es E( β~

) = AE( y) = AXβ => AX = I

Mi ni mum var i ance condi t i on: Sel ect A so t hat

E( β~

) = β and Var ( β~

) i s a mi ni mum. Let Var ( β~

k) = a'

kΣa

k

wher e a'k

i s kt h r ow of t he mat r i x A. The mi ni mi zat i on pr obl em i s

t o mi n a'k

Σak

s . t . X' ak

= ik

( wher e ik

i s t he kt h col umn of t he

i dent i t y mat r i x) .

l = a' Σa + λ' ( X' a- I )

= 2 a + X = 0a

∂∑ λ

′∂

l

= X'a - I = 0, so X'A = I∂

′∂λ

l

-1-1a = X

2λ∑ .

Now f r om X' a = I , we subs t i t ut e f or a and have:

-1-1=> X X = I

2′ λ∑

λ = - 2 ( X' Σ- 1

X)- 1

I

=> a = Σ- 1

X( X' Σ- 1

X)- 1

I

a' = I ' ( X' Σ- 1

X)- 1

X' Σ- 1

so A = ( X' Σ- 1

X)- 1

X' Σ- 1

and

yX)XX( = ~ -1-1-1

∑′∑′β .

We obser ve t hat t he BLUE and MLE of β ar e i dent i cal , but di f f er ent

f r om t he l eas t squar es es t i mat or of β.

17I V

3. Distribution of β , β~

, and ∆

β .

For t he Cl ass i cal Nor mal Li near Regr ess i on Model

( ε ~ N ( 0, σ2I ) )

β = β~

= ∆

β = ( X' X)- 1

X' y ~ N( β; σ2( X' X)- 1

)

For t he Gener al i zed Regr ess i on Model ( ε ~ N( 0, Σ) ) we have

β = ( X' X)- 1

X' y = A1

y

and

β~

=∆

β = ( X' Σ- 1

X)- 1

X' Σ- 1

y = A2

y

Maki ng use of t he usef ul t heor em

I f y ~ N[ µy

; Σy

] , t hen

z = Ay ~ N [ µz

= Aµy

; Σz

= AΣy

A' ] ,

we obt ai n

β ~ N [ A1 Xβ; A1 Σ A' 1]

~ N [ β; ( X' X)- 1

X' Σ X( X' X)- 1

]

β~

=∆

β ~ N [ A2 Xβ; A2 Σ A' 2]

~ N [ β; ( X' Σ- 1

X)- 1

] .

Not e t hat t he β , β~

, and ∆

β ar e unbi ased es t i mat or s of β, but

Var ( β i) > Var ( β~

i) = Var (∆

β i) .

Al so not e t hat f or t he case Σ = σ2I , t hese r esul t s i ncl ude t he

f ol l owi ng as a speci al case

β = β~

=∆

β ~ N [ β, σ2 ( X' X)- 1

] .

18I V

4. Consequences of using least squares formulas when Var(ε) = ΣΣΣΣ ≠≠≠≠ σ2I

β = ( X' X)- 1

X' y and Var ( β ) =( X' X)- 1

X' Σ X( X' X)- 1 ( )

12 'X Xσ−

≠

a. β i s an unbi ased and cons i s t ent es t i mat or of β.

b. β i s not ef f i ci ent , Var ( βi

) ( )iVar β≥ % .

c. The use of σ2( X' X)- 1

wi l l f r equent l y r esul t i n ser i ous

under es t i mat es of Var ( β ) . *Associ at ed f or ms of t and F

s t at i s t i cs ar e no l onger val i d. However , r obus t measur es of

t he act ual s t andar d er r or s can be used t o cons t r uct “t -

s t at i s t i cs” whi ch ar e asymptotically val i d.

d. Pr edi ct i ons of yt

based on OLS wi l l yi el d l ar ger sampl i ng

var i at i on t han coul d

βi

19I V be obt ai ned us i ng al t er nat i ve t echni ques . See t he next

sect i on f or mor e det ai l .

5. Predictions in the generalized regression model:

Gol dber ger ( JASA, 1962) demons t r at ed t hat t he bes t unbi ased

pr edi ct i on of yt

i n per i od n + h, h- per i ods i n t he f ut ur e, i s

gi ven by

yn

( h) = yn+h

= Xn+h

∆

β + W' Σ- 1

e

wher e

∆

β = ( X' Σ- 1

X)- 1

X' Σ- 1

y

e = y - X∆

β

W = E( ε'N + h

ε) .

Ther ef or e t he pr edi ct i ons f or OLS or MLE may have sampl i ng var i ances

whi ch ar e l ar ger t han coul d be obt ai ned us i ng t he Gol dber ger

t echni que.

Not e:

a. I f t he ε' s ar e uncor r el at ed t hen

1

n+h 1

n n

n+h n

n

W = E = E = 0+

ε ε ε

ε ε ε ε

M

and t he bes t l i near unbi ased pr edi ct or of yt i n per i od n+h i s

yn+h

= Xn+h

∆

β

b. I f t her e i s cor r el at i on bet ween t he r andom di s t ur bances , t hen

t he bes t l i near unbi ased pr edi ct or may di f f er f r om our BLUE of

20I V

t he det er mi ni s t i c component Xn+hβ. The adj us t ment , W’Σ

- 1e,

woul d “cor r ect ” f or t he exi s t ence of cor r el at i on bet ween t he

r andom di s t ur bances .

21I V

6. Alternative methods of obtaining BLUE or MLE of β by transforming data or

using Generalized Least Squares (GLS).

The di scuss i on i n t hi s sect i on pr ovi des mot i vat i on f or t he way MLE

can be per f or med i n r egr ess i on pr ogr ams . Cons i der t he gener al i zed

r egr ess i on model :

y= Xβ + ε ε ~ N ( 0, Σ)

Tr ansf or m t he model ( and dat a) by pr emul t i pl yi ng by a

transformation mat r i x T, i . e. ,

[ Ty] = [ TX] β + [ Tε]

I f we sel ect a transformation mat r i x T such t hat

Tε ~ N ( 0, TΣT' = σ2I ) ,

t hen i t f ol l ows t hat

TΣT' = σ2I ( Tr ansf or med er r or t er ms Tε, sat i s f y ( A. 1) - ( A. 4) ) .

Σ = σ2T- 1

( T' )- 1

or Σ- 1

= σ- 2

T' T.

Appl yi ng l eas t squar es t o t he t r ans f or med dat a, we obt ai n

β T = [ ( TX) ’TX]- 1

[ X’T’Ty] = ( ) ( )1

' ' ' 'X T TX X T Ty−

=

whi ch yi el ds t he maxi mum l i kel i hood es t i mat or of β, i . e. ,

β T = ( X’Σ- 1

X)- 1

X’Σ- 1

y

In other words, applying least squares to an appropriately transformed

regression model will yield MLE of β. These estimators are sometimes

referred to as generalized least squares (GLS) estimators of β .

22I V

7. Robust estimates of the standard errors of the OLS estimator

As we not ed ear l i er , i f 2IΣ ≠ σ ,

( ) ( ) ( )-1 1-1 2

OLSVar X'X X' X(X'X) X Xˆ '−

β = Σ ≠ σ and OLS “s t andar d er r or s”

r epor t ed by mos t comput er pr ogr ams , ( )12 's X X

−, wi l l be

i nappr opr i at e f or cons t r uct i ng t - s t at i s t i cs . Whi t e ( 1980,

Economet r i ca, pp. 817- 838) and Newey- West ( 1987, Economet r i ca, 703-

708) out l i ne how t o obt ai n cons i s t ent es t i mat or s of t he cor r ect

( )ˆOLS

Var β f or t he cases of het er oskedas t i ci t y and aut ocor r el at i on.

These pr ocedur es ar e pr ogr ammed i nt o many economet r i c packages .

In Stata

. for heteroskedasticity: reg dep_var rhs_vars, robust

or

. for autocorrelation: newey dep_var rhs_vars, lag(#) wher e

( #) i s

t he maxi mum number of l ags t o cons i der i n t he aut ocor r el at i on

s t r uct ur e. Typi ng “l ag( 0) i s t he same as us i ng t he “r eg …,

r obus t ”

command above.

23I V

E. Heteroskedasticity (Violation of (A.3))

1. Introduction

I n cer t ai n appl i cat i ons t he r esear cher may f i nd t hat t he

assumpt i on

Var ( yt

) = Var ( εt

) = σ2 f or al l t

appear s t o be i ncons i s t ent wi t h t he dat a and model under

cons i der at i on. Thi s pr obl em can ar i se i n a number of cont ext s . For

exampl e, i f t he dat a ar e obt ai ned by combi ni ng cr oss - sect i onal and

t i me ser i es dat a wher e di f f er ent sampl e s i zes ar e i nvol ved, one mi ght

expect t he aver ages ( or t ot al s ) associ at ed wi t h t he l ar ges t sampl e

s i ze t o have a di f f er ent var i ance t han obser vat i ons associ at ed wi t h

t he smal l es t sampl e s i ze. Anot her exampl e of het er oskedas t i ci t y whi ch

mi ght ar i se i n an anal ys i s of expendi t ur e pat t er ns ( Ct

) cor r espondi ng

t o di f f er ent i ncome l evel s ( yt

) i n budget s t udi es .

I n t hi s exampl e we not e t hat t her e appear s t o be gr eat er var i at i on i n

consumpt i on l evel s associ at ed wi t h hi gher i ncome l evel s t han f or l ower

β2 = s l ope

β1

24I V l evel s . Thi s mi ght ar i se because i ndi vi dual s wi t h hi gher i ncomes can

make mor e di scr et i onar y pur chases t han t hose wi t h l ower i ncomes who

spend mos t of t hei r i ncome on necess i t i es . Thi s s i t uat i on coul d be

model ed as

Ct

= β1

+ β2

Yt

+ εt

( A. 1) , ( A. 2) , ( A. 3) ’: εt

~ N( 0, σt

2)

( A. 4) Cov ( εt

, εs

) = 0 t ≠ s

( A. 5) Same as bef or e.

Mor e gener al l y t he het er oskedas t i c model can be model ed as

y = Xβ + ε

( A. 1) ' ε ~ N[ 0, Σ]

( A. 5) The X' s ar e nons t ochas t i c and

-1

n

(X X) Limit

n→∞

′i s nons i ngul ar

wher e

21

22

2n

... 0

0

. . .

. . .

. . .

0 ...

σ

σ

Σ =

σ

.

As not ed i n t he pr evi ous sect i on, i f Σ ≠ σ2I ( any of t he var i ances

ar e unequal ) , l eas t squar es es t i mat or s wi l l not be equal t o t he MLE or

BLUE of β. Leas t squar es es t i mat or s wi l l s t i l l be unbi ased and

cons i s t ent , but wi l l not be mi ni mum var i ance nor asympt ot i cal l y

ef f i ci ent and t he s t andar d s t at i s t i cal t es t s based on l eas t squar es

ar e i nval i d. For t hi s r eason i t i s i mpor t ant t o t es t f or t he

exi s t ence of het er oskedas t i ci t y.

25I V

2. Test for Heteroskedasticity

The bas i c i dea behi nd al l of t hese t es t s i s t o det er mi ne whet her t her e

appear s t o be any sys t emat i c behavi or of t he var i ances of t he er r or s .

The f i r s t t es t , t he Gol df el d- Quandt t es t , gr oups t he dat a and t es t s

f or equal i t y of t he var i ances of t he di f f er ent gr oups . Many of t he

ot her t es t s use t he squar ed OLS r es i dual ( )2

te as a pr oxy f or 2

tσ and

sear ch f or sys t emat i c r el at i onshi ps bet ween ( )2

te and ot her var i abl es .

a. Goldfeld-Quandt Test

The nul l hypot hes i s t o be i nves t i gat ed i s

H0

: 2

1σ = 2

2σ = . . . = 2

nσ

A common t es t f or het er oskedas t i ci t y i s t he Gol df el d- Quandt t es t .

( 1) Di vi de t he dat a i nt o t hr ee gr oups ( r oughl y equal s i zes n1

+ n2

+ n3

= n)

( 2) Run separ at e r egr ess i on on gr oups I and I I I . Let s 2

I and s 2

III

r epr esent t he cor r espondi ng es t i mat or s of σ2.

( 3) Under t he nul l hypot hes i s of homoskedas t i ci t y,

2III

3 12I

s ~ F( - k, - k)n n

s

*pl ace t he l ar ger s 2 i n t he numer at or .

26I V

Under t he nul l hypot hes i s one woul d expect 2

III2

I

ss

t o be f ai r l y

cl ose t o one and l ar ge di f f er ences f r om one woul d pr ovi de t he

bas i s f or r ej ect i ng t he nul l hypot hes i s . Thi s i s an exact

t es t . A di sadvant age of t he t es t ar i ses i n cases i n whi ch

many r egr essor s ar e i nvol ved and a nat ur al or der i ng may not be

obvi ous t o f or m t he t hr ee gr oups .

b. The Park test (Glejser test) can be t hought of as bei ng based upon

us i ng et as a pr oxy f or σ

t and t hen i nves t i gat i ng r el at i onshi ps

of t he f or m

et

= f ( Xt

) or

2

te = g( Xt

) .

F(n3 - k, n1 - k)

Fail to Reject H0 Reject H0

27I V Var i ous f or ms f or t he f unct i ons f ( ) and g( ) have been

cons i der ed. The nul l hypot hes i s of homoskedas t i ci t y i s t es t ed by

i nves t i gat i ng whet her t he X’s i n f ( Xt

) or g( Xt

) have any

col l ect i ve expl anat or y power . St at i s t i cal l y s i gni f i cant

expl anat or y power of t he Xt woul d pr ovi de t he bas i s f or r ej ect i ng

t he assumpt i on of homoskedas t i ci t y. The exact val i di t y of F t es t s

i s ques t i onabl e, wi t h t hei r use bei ng based on asympt ot i c

cons i der at i ons . Recal l t hat t he et

' s ar e cor r el at ed even i f t he ε

t' s ar e uncor r el at ed.

c. The White test [ Economet r i ca, 1980, pp. 817- 38] . Hal Whi t e

sugges t s r egr ess i ng 2

te on al l of t he expl anat or y var i abl es , t hei r

squar es , and cr oss pr oduct s and t hen t es t i ng f or t he col l ect i ve

expl anat or y power of t he r egr essor s . The r at i onal e f or t hi s t es t

i s t hat t he hypot hes i s 2

tσ = f ( Xt

) i s bei ng i nves t i gat ed wi t h 2

te as

a pr oxy f or 2

tσ and us i ng a second or der Tayl or Ser i es

appr oxi mat i on f or t he f unct i on f ( Xt

) . The nul l hypot hes i s of

homoskedas t i ci t y woul d be cons i s t ent wi t h a l ack of s t at i s t i cal

s i gni f i cance t es t . Whi t e ment i ons t he use of a Rao or Lagr angi an

mul t i pl i er t es t

LM = NR2

whi ch i s asympt ot i cal l y Chi squar e wi t h degr ees of f r eedom equal

t o t he number of s l ope coef f i ci ent s ,2

1)2)(k-(k+, i n t he “ 2

te

auxi l i ar y” r egr ess i on equat i on.

Not e: The R2 i n t he LM t es t i s t he R2 f r om t he pr evi ous l y

descr i bed “ 2

te r egr ess i on” equat i on. The Whi t e t es t can be

per f or med by r et r i evi ng t he es t i mat ed er r or s and r egr ess i ng t hem

on t he var i abl es , t hei r squar es , and cr oss - pr oduct s .

28I V

Al t er nat i vel y, t he St at a command reg y x’s, f ol l owed by whitetst on

t he next l i ne wi l l aut omat i cal l y per f or m t he Whi t e Tes t .

d. The modified White test. For l ar ge k, t he Whi t e t es t i nvol ves

many r egr essor s wi t h l ar ge degr ees of f r eedom. To ci r cumvent t hi s

pr obl em, Whi t e pr oposed an al t er nat i ve t es t based on es t i mat i ng

t he model :

2 2

0 1 2ˆ ˆ

t t t te y yδ δ δ η= + + +

wher e ˆt

y denot es t he pr edi ct ed y’s f r om an i ni t i al OLS es t i mat i on

of t he or i gi nal model The cor r espondi ng LM t es t ( 2NR ) i s

asympt ot i cal l y di s t r i but ed as a ( )2 2χ .

e. Breusch-Pagan Test. Thi s t es t i s i ncl uded i n St at a. I t

i s per f or med by r egr ess i ng t he squar es of t he es t i mat ed er r or s on

t he X’s or ot her var i abl es and t es t i ng f or t he col l ect i ve

expl anat or y power us i ng an LM t es t or an F t es t . The St at a

commands ar e:

reg y x

estat hettest (performs the regression

2

0 1ˆ

t t te yδ δ η= + + ) , iid ( r epor t s LM t es t s t at i s t i c) or fstat

( r epor t s t he F- s t at i s t i c)

Al t er nat i ves or var i at i ons

estat hettest x’s, iid or normal or fstat

estat hettest, rhs

estat hettest x’s, x^2’s, cross-products, iid

or fstat

estat hettest yhat yhat^2, ftest or iid

wher e t he LM or F- t es t s can be used t o

t es t 2 2

0 :t

H σ σ= ( homoskedas t i ci t y).

3. Estimation

29I V

a. Viewed as applying OLS to an appropriately transformed model (Stata)

For appl i cat i ons i n whi ch t he r andom di s t ur bances ar e

char act er i zed by het er oskedas t i ci t y, BLUE and MLE of β wi l l be

unbi ased, cons i s t ent , and have smal l er var i ances t han l eas t

squar es es t i mat or s . I n sect i on ( I V. D. 5) we demons t r at ed t hat i f a

mat r i x T can be f ound such t hat

Var ( Tε) = σ2I ( or Σ- 1

= σ- 2

T' T) ,

t he MLE ( and BLUE) of β can be obt ai ned by t r ans f or mi ng t he dat a

( model ) f r om

y = Xβ + ε

t o

Ty = TXβ + Tε

and appl yi ng l eas t squar es t o t he t r ans f or med model .


yt

= Xtβ + ε

t

= β1

+ β2

xt 2

+ . . . + βk

xt k

+ εt

wher e εt

~ N ( 0, σ 2

t ) .

We wi l l cons i der t he t r ans f or mat i on f r om a s l i ght l y di f f er ent

per spect i ve. The or i gi nal model can be t r ans f or med t o a f or m

char act er i zed by homoskedas t i ci t y by pr emul t i pl yi ng t he or i gi nal

f or mul at i on by σ/ σt

, i . e. , ( wher e σ i s an unknown cons t ant )

σ

σε

σ

σβ

σ

σβ

σ

σβ

σ

σ

t

t

t

tk

kt

2t

2t

1t

t + x

+ ... + x

+ = y

.

Not e t hat t he var i ance of t he t r ans f or med r andom di s t ur bance i s

gi ven by

2t

t2t t

Var = Var( ) σ ε σ

ε σσ

30I V

σσ

σσ 2

2t

2t

2

= =

and t he er r or s i n t he t r ans f or med r egr ess i on, σεt

/ σt

, sat i s f y

t he assumpt i ons ( A. 1) - ( A. 4) .

The cor r espondi ng transformation mat r i x i s gi ven by

1

2

3

n

10 0 0 0

10 0 0 0

10 0 0T

0 0 0 0

10

σ σ

= σ σ

σ

K

L

L

M O M

L

Not e t hat :

21 11

2

22 2

2

n

n n

1 10 00

1 1T T

01 10 0

σ σ σ

σ σ σ ′Σ = σ σ

σ σ σ

OO O

= σ2I

and t he t r ansf or med dat a mat r i ces ar e gi ven by:

31I V

11

1

1

n

nnn

1 y0 0y

0y* Ty ,

y10

y

σ σ = σ = σ = σ σ

K

MM M

M O

TX =

/x /x/1

. ..

. ..

. ..

/x.../x/1

= X*

nnkn2nn

1k11121

σσσ

σσσ

σ .

An appl i cat i on of l eas t squar es t o t he t r ansf or med dat a wi l l

yi el d MLE and BLUE of β. I t can be ver i f i ed t hat T' T = σ2Σ- 1

.

Not e:

I n t he GLS es t i mat or t he mul t i pl i cat i ve cons t ant i n t he

t r ansf or mat i on mat r i x i s ar bi t r ar y and wi l l cancel out . I n

summar y, i f t he or i gi nal model i s y X β ε= + , and we appl y OLS

t o t he t r ansf or med model , we obt ai n

ˆT

β = ( X' T' TX)- 1

X' T' Ty

= ( Xσ2 Σ-1X) -1 X' σ2 Σ-1y

= ( X' Σ- 1

X)- 1

X' Σ- 1

y

= ∆

β = β~

.

Thus when choos i ng a T mat r i x f or dat a t r ans f or mat i on, t he

unknown cons t ant σ need not be speci f i ed.

b. Estimation using Stata:

The command

vwls y X’s, sd(t

σ )

32I V wi l l per f or m t he pr evi ous l y descr i bed es t i mat i on and yi el d MLE.

The mai n pr obl em i s t o det er mi ne what t he t

σ shoul d be.

4. Nature of Heteroskedasticity (σt's) and estimation

The pr obl em of es t i mat i ng t he σt

s t i l l r emai ns and t her e i s not a

gener al sol ut i on whi ch wi l l wor k i n al l cases .

a. Sometimes σt can be deduced from the model

( 1) yt

= at + ηt

t = number of t osses of a coi n

yt

= number of heads i n t t osses

E( yt

) = at

Var ( ηt

) = npq = t ( 1/ 2) ( 1- 1/ 2) = t / 4 = 2

tσ

St at a Commands f or MLE ar e:

gen sig =t^.5

vwls y t,sd(sig)

The l eas t squar es es t i mat i on of a i s gi ven by a = Σt yt

/ Σt 2

and t he MLE of a i s Σyt

/ Σt = t ot al number of heads / t ot al

number of t osses .

( 2) Combi nat i on of t i me ser i es and cr oss - sect i onal dat a

( yt

, Xt

) t i me ser i es obt ai ned by t aki ng

aver ages of cr oss - sect i onal sampl es of s i ze nt

Let yt

= a + bxt

+ εt

be t he model , t hen an assumpt i on whi ch

mi ght be "r easonabl e" i s

Var ( yt

) = Var ( εt

) = σ2/ nt

The cor r espondi ng St at a commands f or MLE ar e

33I V

gen sig = 1/ tn ^ .5

vwls y x, sd(sig)

b. Sometimes the researcher can analyze the behavior of the residuals and look

for trends

Tr y σ 2

t = σ2xt

or σ 2

t = σ2 x 2

t .

I f σ 2

t = σ2xt

t hen use t he St at a commands

gen sig=x^.5

vwls y x, sd(sig)

Si mi l ar l y i f σ2t = σ2x2

t , t hen use t he St at a commands

gen sig=x

vwls y x, sd(sig)

c. An example of Feasible GLS with multiple regressors (Wooldridge).

Cons i der t he model yt

= Xtβ + ε

t wi t h ( )2 tX

t t tVar X e

δσ ε= = .

Estimated or f eas i bl e GLS ( BLUE) of t he unknown coef f i ci ent s i n t he

or i gi nal r egr ess i on model can be obt ai ned as f ol l ows:

( 1) Regr ess y on t he X’s t o obt ai n t he es t i mat ed r es i dual s ( e)

reg y X’s

34I V ( 2) Regr ess t he nat ur al l ogar i t hm of t he squar ed OLS r es i dual s

on t he X’s and save t he pr edi ct ed val ues ( ˆt

X δ ) .

predict e, resid

gen Le2=ln(e*e)

reg Le2 X’s

predict xdelta,xb

gen sig=(exp(xdelta))^.5

Use t he cal cul at ed wei ght s (( )( )

.5ˆtX

te

δσ = ) t o per f or m a wei ght ed

l eas t squar es

vwls y X’s,sd(sig)

Al t er nat i ve assumpt i ons about t he nat ur e of het er oskedas t i ci t y

coul d be used i n t hi s pr ocedur e.

5. Predictions

The bes t l i near unbi ased pr edi ct or s wi l l be gi ven by

( )ˆ ˆn h nY Y h+ = = X

n+h ∆

β

( see not es ( sect i on D. 5) ) .

F. Autocorrelation (Violation of A.4)

1. Introduction

One of t he mos t common vi ol at i ons of ( A. 1) - ( A. 5) wi t h t i me ser i es dat a i s

t he pr esence of aut ocor r el at ed r andom di s t ur bances i n r egr ess i on model s .

Aut ocor r el at ed r andom di s t ur bances r ef er s t o t he pr obl em i n whi ch t he

er r or t er ms ar e not s t at i s t i cal l y i ndependent . When wor ki ng wi t h t i me

ser i es dat a, you shoul d be awar e of t he poss i bi l i t y of what i s known as

t he spurious regression pr obl em. Thi s pr obl em can ar i se when t he dependent

var i abl e ( y) and one or mor e of t he expl anat or y var i abl es ( say X) bot h

35I V exhi bi t a t r endi ng behavi or . I n t hi s s i t uat i on, r egr ess i ng y on X may

sugges t a s t at i s t i cal l y s i gni f i cant r el at i onshi p bet ween y and X, when

t hey ar e unr el at ed ( a spur i ous r egr ess i on) and onl y appear r el at ed because

of a shar ed t r endi ng behavi or . One appr oach t o ci r cumvent i ng t hi s

s i t uat i on i s t o i ncl ude “t” i n t he set of r egr essor s , e. g. ,

t 1 2 t 3 ty X t= β + β + β + ε . I f t hi s i s t he cor r ect model and t he var i abl e t i s

del et ed f r om t he equat i on, t he r esul t ant es t i mat or s of 1 2 and β β wi l l be

bi ased. The OLS es t i mat e f or 2β i s t he same as woul d ar i se f r om

r egr ess i ng t he r es i dual s f r om a r egr ess i on of y on t on t he r es i dual s

obt ai ned f r om r egr ess i ng x on t .

Ti me ser i es r egr ess i ons i n St at a r equi r e t he user t o des i gnat e t hat

t he ser i es i s a t i me ser i es by i ncl udi ng a command of t he f or m tsset t wher e

t i s a t i me- var i abl e whi ch indexes t he dat a. Thi s can be cr eat ed wi t h t he

command gen t=_n.

The case of pos i t i ve aut ocor r el at i on mi ght be depi ct ed as f ol l ows:

β1

+ β2

Xt

36I V

Not e t hat pos i t i ve r andom di s t ur bances t end t o be f ol l owed by pos i t i ve

r andom di s t ur bances and negat i ve r andom di s t ur bances t end t o be f ol l owed

by negat i ve r andom di s t ur bances . Thus , we ar e f aced wi t h a s i t uat i on i n

whi ch t he non- di agonal el ement s of

( ) ( ) ( )( ) ( )

( ) ( )

1 1 2 1 n

2 1 2

n 1 n

Var Cov , Cov ,

Cov , Var

Cov , Var

ε ε ε ε ε

ε ε ε Σ = ε ε ε

L

M

M O

L

ar e nonzer o; t her ef or e Σ ≠ σ2I and t he l eas t squar es es t i mat or s of β

agai n wi l l not equal t he MLE or BLUE of β and ar e t her ef or e not mi ni mum

var i ance es t i mat or s .

Poss i bl e causes of aut ocor r el at ed r andom di s t ur bances mi ght i ncl ude

del et i ng a r el evant var i abl e, sel ect i ng t he i ncor r ect f unct i onal f or m, or

t he model may be cor r ect l y speci f i ed, but t he er r or t er ms ar e cor r el at ed.

The mat r i x Σ cont ai ns2

1)n(n+ =

2

1)n(n- +n di s t i nct el ement s . I n t he

cont ext of t he gener al i zed r egr ess i on model , we l ack suf f i ci ent dat a t o

obt ai n separ at e i ndependent es t i mat es f or each of t he Cov( εiεj

) . I n or der

t o ci r cumvent t hi s pr obl em we f r equent l y assume t hat t he εt

' s ar e r el at ed

i n such a manner t hat f ewer par amet er s descr i be t he pr ocess . One such

model whi ch pr ovi des an accur at e appr oxi mat i on i n many cases i s t he f i r s t

or der aut or egr ess i ve pr ocess

εt

= ρ εt - 1

+ ut

wher e t he ut

ar e assumed t o be i ndependent l y and i dent i cal l y di s t r i but ed

as N( 0, σ 2

u ) . Not e t hat t he ut

sat i s f y assumpt i ons ( A. 1) - ( A. 4) . Based

upon t hi s f or mul at i on i t can be shown t hat E( εt

) = 0

37I V

• ρ

σσε ε 2

2u2

t-1

= = )Var(

• Cov( εt

, εt - s

) = ρs σ 2

ε

= 0 <=> ρ = 0

• Cor r ( εt

, εt - s

) = ρ s

Not e: εt

= ρ( εt - 1

) + ut

= ρ( ρεt - 2

+ ut - 1

) + ut

= ρ2εt - 2

+ ρut - 1

+ ut

= ut

+ ρut - 1

+ ρ2ut - 2

. . .

u = rt-r

0=r

ρ∑∞

=> E( εt

) = 0 s i nce E( ut - r

) = 0 f or al l t and r

... + )uE( + )uE( + )uE( = )E( 22t-

421t-

22t

2t ρρε

= 2

uσ ( 1 + ρ2 + ρ4 + . . . )

= σ 2

u / ( 1 - ρ2)

E( εt

εt - s

) = ...)] u + u + ux(...) u + u + uE[( 22s-t-1s-t-st-

22t-1t-t ρρρρ

= E [ ut

+ ρut - 1

+ . . . ρs( ut - s

+ ρut - s - 1

+ . . . ) ] ( ut - s

+ ρut - s - 1

. . . )

= ρs E[ ( ut - s

+ ρut - s - 1

+ . . . ) 2] ( )2

2

s

tEρ ε −=

= ρs σ 2

ε = ρs σ2u/ ( 1 - ρ2) .

We obser ve t hat t he r andom di s t ur bances εt

ar e char act er i zed by cons t ant

var i ance ( homoskedas t i ci t y) but ar e uncor r el at ed i f and onl y i f ρ = 0 i n

38I V whi ch case t he εt = ut and assumpt i ons ( A. 1) and ( A. 4) ar e sat i s f i ed. We

al so not e t hat s i nce

Cov( εt

, εt - 1

) = E( εt

εt - 1

) = 2

ερσ , i . e. ,

we expect a gener al pat t er n of pos i t i ve r andom di s t ur bances t o be f ol l owed

by pos i t i ve r andom di s t ur bances and negat i ve val ues t o be f ol l owed by

negat i ve val ues i f ρ > 0. However , i f ρ < 0, we woul d gener al l y expect

t he s i gns of t he r andom di s t ur bances t o al t er nat e.

Based upon t he assumpt i on t hat t he pr ocess εt

i s a f i r s t or der

pr ocess , we can wr i t e t he associ at ed var i ance covar i ance mat r i x as

2 n 1

n 2

2u 2 n 3

2

n 1 n 2 n 3

1

1

= .11-

1

−

−

−

− − −

ρ ρ ρ

ρ ρ ρ σ ∑ ρ ρ ρ

ρ ρ ρ ρ

L

L

L

M M M O M

L

Σ i s now compl et el y char act er i zed by t he t wo par amet er s ρ and 2

εσ =2

u

21

σ

− ρand

t he es t i mat i on pr obl em i s cons i der abl y s i mpl i f i ed.

A pl ot of cor r ( εt

, εt - s

) f or di f f er ent val ues of s i s r ef er r ed t o as

t he cor r el ogr am of t he pr ocess εt

. I f t he sampl e cor r el ogr am ( gr aph of

es t i mat ed cor r el at i on coef f i ci ent s ) appear s

as

ρ

39I V

ρ2

0 1 2 s

We woul d i nt er pr et t hi s evi dence as bei ng cons i s t ent wi t h t he assumpt i on

of a f i r s t - or der aut or egr ess i ve pr ocess wi t h a pos i t i ve ρ. The sampl e

cor r el ogr am can be gener at ed wi t h t he Stata commands : r eg y x’s

pr edi ct e, r es

ac e, l ags ( # of l ags )

We have shown t hat wi t hi n t he cont ext of a f i r s t - or der aut or egr ess i ve

model Σ = σ2I , i f and onl y i f ρ = 0. I t becomes i mpor t ant t o t es t t he

hypot hes i s t hat ρ = 0.

A mor e gener al model f or t he di s t ur bances i s an aut or egr ess i ve movi ng

aver age ( ARMA( p, q) ) def i ned by

εt

- φ1εt - 1

. . . - φpεt - p

= ut

- θ1

ut - 1

. . . - θq

ut - q

.

Thi s model wi l l be s t udi ed i n mor e det ai l i n anot her sect i on. Not e t hat

t hi s speci f i cat i on i ncl udes t he f i r s t or der aut or egr ess i ve pr ocess as t he

f ol l owi ng speci al case

ARMA ( p = 1, q = 0) : εt

- φ1εt - 1

= ut

.

2. Tests for autocorrelation.

a. The right hand side variables are exogenous

Ther e ar e numer ous t es t s f or t he pr esence of aut ocor r el at i on wher e t he

r i ght hand s i de var i abl es ar e exogenous . Among t hese ar e ( 1) t he Dur bi n

Wat son t es t , ( 2) t es t s s t r uct ur ed i n t er ms of an es t i mat or of t he

cor r el at i on bet ween εt

and εt - 1

, ( 3) Thei l - Nagar t es t , ( 4) t he Von Neumann

r at i o, ( 5) t he Br eusch- Godf r ey t es t , ( 6) t he Lj ung- Box t es t , and ( 7) a

t es t f or t he number of s i gn changes i n t he es t i mat ed r andom di s t ur bances

40I V ( Runs t es t ) . Of t hese t es t s , t he Dur bi n Wat son t es t s t at i s t i c i s pr obabl y

t he mos t wi del y used.

( 1) Dur bi n- Wat son t es t

The Dur bi n- Wat son t es t s t at i s t i c i s def i ned by

wher e et

denot es t he l eas t squar es es t i mat or of t he r andom

di s t ur bance εt

. Thi s expr ess i on can be wr i t t en i n a usef ul

al t er nat i ve f or m by not i ng t hat

e + ee 2 - e = )e-e( 2

1t-

n

2=t

1t-t

n

2=t

2t

n

2=t

21t-t

n

2=t

∑∑∑∑

n n n2 2 2 2t t t t-1 1 n

t=1 t=1 t=2

= + - 2 - - e e e e e e∑ ∑ ∑

e - e - ee - e 2 = 2n

211t-t

n

2=t

2t

n

1=t

∑∑

hence,

e

e - e - ee - e 2

= .W.D2t

n

1=t

2n

211t-t

n

2=t

2t

n

1=t

∑

∑∑

( )n

t t-12 21 n t=2

n n2 2t t

t=1 t=1

/e e + e eˆ ˆ= 2(1- ) - where =

/e e

n

n

ρ ρ∑

∑ ∑

so t hat D.W. 2(1 - ρ ) wi t h ρ denot i ng an es t i mat or of ρ, t he

cor r el at i on bet ween t-1 t

andε ε .

e

)e - e(

= .W.D2t

n

1=t

2

1t-t

n

2=t

∑

∑

41I V Fr om t hi s expr ess i on we not e t hat i f ρ = 0, we woul d expect t o

have ρ "cl ose" t o zer o and t he val ue of D. W. cl ose t o t wo. Si nce

D. W. depends upon t he dat a, associ at ed conf i dence i nt er val s woul d be

dat a dependent . Some economet r i c pr ogr ams use t he dat a and cal cul at e

exact p-values. To ci r cumvent t hi s pr obl em, Dur bi n and Wat son der i ved

t he di s t r i but i on of t wo s t at i s t i cs L and U whi ch ar e i ndependent of

t he dat a and bound D. W. , L< D.W. <U . Tabul at ed cr i t i cal val ues f or

t he D. W. ar e based on L and U; hence, t he r epor t ed conf i dence

i nt er val s f or t he hypot hes i s ρ = 0 f or D. W. ( der i ved f r om conf i dence

i nt er val s f or t he bounds) may appear somewhat pecul i ar as i l l us t r at ed

by t he f ol l owi ng f i gur e.

42I V The val ues of dL and dU def i ne t he cr i t i cal r egi on and ar e

t abul at ed i n many t ext s accor di ng t o t he cr i t i cal l evel ( α l evel ) ,

sampl e s i ze ( n) , and number of noni nt er cept ( s l ope) coef f i ci ent s i n

t he model ( k' ) . The t abl es have been ext ended t o cover addi t i onal

sampl e s i zes and number of expl anat or y var i abl es by Savi n and Whi t e

[ Economet r i ca, 1977] .

The nul l hypot hes i s Ho: ρ = 0 i s rejected i f

D. W. < dL or D. W. > 4 - dL.

We fail to reject t he hypot hes i s i f

dU < D. W. < 4 - dU,

and t he t es t i s inconclusive i f

dL < D. W. < dU or 4 - dU < D. W. < 4 - dL.

Thi s t es t i s not s t r i ct l y appr opr i at e f or model s wi t h l agged

dependent var i abl es i ncl uded ( see Dur bi n, Economet r i ca, 1970) . The

D. W. t es t does not t ake account of t he expl anat or y var i abl es , whi ch

r esul t s i n t he exi s t ence of an “uncer t ai n r egi on. ” The St at a

commands t o cal cul at e t he D. W. s t at i s t i c ar e:

o reg lhs_var rhs_vars

o estat dwatson (performs a Durbin Watson test for

serial correlation)

o estat bgodfrey or

o estat bgodfrey, lags(1/4)

An exact D. W. t es t whi ch t akes account of t he X' s and does not

i nvol ve an “uncer t ai n” r egi on i s avai l abl e i n some comput er

pr ogr ams . The Shazam command t o cal cul at e t he exact D. W. i s OLS y

x’s , DWPVALUE .

( 2) Wool dr i dge’s t - t es t

43I V

Wool dr i dge t es t of 0 : 0H ρ = , no aut ocor r el at i on, i s based on t es t i ng

whet her l agged OLS er r or s have s t at i s t i cal l y s i gni f i cant expl anat or y

power f or cur r ent er r or s . Thus , t he r egr ess i on commands coul d be

reg y x’s

predict e, resid

reg e l.e

and a t or F s t at i s t i c i s used t o t es t f or s t at i s t i cal s i gni f i cance,

r ecogni zi ng t hat t hei r val i di t y i s based on asympt ot i c

di s t r i but i ons . Thi s appr oach woul d not be val i d f or t he hypot hes i s

0 : 1H ρ = because t he cor r espondi ng t - s t at i s t i c i s not di s t r i but ed as

a t - s t at i s t i c. A Di ckey- Ful l er t es t coul d be used f or t hi s

hypot hes i s .

b. Tests in the presence of lagged dependent variables

( 1) Dur bi n’s h- t es t , def i ned by,

2

_y coefficient

12 1

lagged

DW nh

ns

= −

− ~N[ 0, 1]

can be used t o t es t f or t he pr esence of aut ocor r el at i on i n an

aut or egr ess i ve model wi t h one l agged dependent var i abl e.

Dur bi n’s h- t es t can be per f or med i n St at a wi t h t he command

f ol l owi ng t he “r eg” command

. estat durbinalt

( 2) The Br eusch- Godf r ey and Lj ung- Box t es t s can be modi f i ed t o

appl y t o aut or egr ess i ve model s . For exampl e t he Br eusch- Godf r ey

t es t can be appl i ed by r egr ess i ng t he OLS

t' on the lagged y's and the lagged e 't

e s s i mpl i ed by t he model

( aut or egr ess i ve and number of aut or egr ess i on or movi ng aver age

44I V er r or s ) and t es t i ng f or t he col l ect i ve expl anat or y power of t he

coef f i ci ent s of t he l agged er r or s us i ng an F- t es t .

3. Estimation

For appl i cat i ons i n whi ch t he hypot hes i s of no aut ocor r el at i on i s

r ej ect ed, we may want t o obt ai n maxi mum l i kel i hood es t i mat or s of t he

vect or β. These can be obt ai ned by pr oceedi ng i n t he same manner as i n

t he case of het er oskedas t i ci t y, i . e. , we wi l l at t empt t o t r ans f or m t he

model so t hat t he t r ans f or med r andom di s t ur bances sat i s f y ( A. 1) - ( A. 4) and

t hen appl y l eas t squar es .


yt

= Xtβ + ε

t = β

1 + β

2x

t 2 + . . . + β

kx

t k + ε

t

wher e

εt

= ρεt - 1

+ ut

t = 1, 2, . . . , n.

Repl aci ng t he t i n t he expr ess i on f or yt

by t - 1 and mul t i pl yi ng by ρ we

obt ai n

ρyt - 1

= ρXt - 1

β + ρεt - 1

= β1ρ + β

2ρx

t - 1 2 + . . . + β

Kρx

t - 1

k + ρε

t - 1

Subt r act i ng ρyt - 1

f r om yt

yi el ds

yt

- ρyt - 1

= β1

( 1- ρ) + β2

( xt 2

- ρxt - 1

2

) + . . . + βk

( xt k

- ρxt - 1

k

) + εt

- ρεt - 1

or y*t

= β1

( 1 - ρ) + β2

xt 2

* + . . . + βk

xt k

* + ut

t = 2, . . . , n

wher e y*t

= yt

- ρyt - 1

xt i

* = xt i

- ρxt - 1

i

t = 2, . . . , n, i = 2, . . . , k.

Not e t hat we have ( n - 1) obser vat i ons on yt

*, xt i

*. The r andom

di s t ur bance t er m associ at ed wi t h t he t r ans f or med equat i on sat i s f i es ( A. 1) -

( A. 4) . The t r ansf or med dat a mat r i ces ar e gi ven by

45I V

ρ

ρ

ρ

ρ

ρ

ρ

y

y

.

.

.

y

y

y

1-...0000

.. .

.. .

.. .

00...01-0

00...001-

=

y - y

.

.

.

y - y

y - y

= y*

n

1n-

3

2

1

1n-n

23

12

( n- 1) x 1 ( n- 1) x n n x 1

= T1

Y

and

2,2 1,2 2,k 1,k

3,2 2,2 3,k 2,k

n,2 n 1,2 n,k n 1,k

1 x x x x

1 x x x xX*

1 x x x x− −

− ρ − ρ − ρ

− ρ − ρ − ρ = − ρ − ρ − ρ

L

L

M

L

= T1X

A common t echni que of es t i mat i on i s t hen based upon appl yi ng l eas t squar es

t o

y* = X* β + u

or

yt

- ρyt - 1

= β1

( 1- ρ) + β2

xt 2

* + . . . + βk

xt k

* + ut

t =

2, . . . , n

Sever al comment s need t o be made about t hi s appr oach. Fi r s t , ρ i s

gener al l y not known and es t i mat es of ρ wi l l need t o be used. Al so not e

t hat t he i nt er cept i n t he t r ansf or med equat i on i s β1

( 1- ρ) , r at her t han 1β ;

46I V hence, t he f i nal es t i mat e of t he i nt er cept mus t be di vi ded by 1- ρ i n or der

t o r ecover an es t i mat e of β1

. Fi nal l y, we need t o ment i on t hat even i f ρ

i s known t hi s es t i mat or of β wi l l not be i dent i cal l y equal t o t he MLE of

β because n- 1 obser vat i ons ar e used r at her t han n obser vat i ons , i . e. , we

ar e not us i ng al l of t he sampl e i nf or mat i on i n t he es t i mat i on. Thi s l as t

pr obl em can be cor r ect ed and MLE of β can be obt ai ned by not i ng t hat

2 2 21 11

1- = 1- 1- y X βρ ρ ρ ε+

( ) ( )2 2121 2

= 1- + 1- Xβ ρ β ρ ( ) ( )ερρβ 12

k12

k-1 + X -1 + ... +

wher e

2 2 2 21 u1- ~ N[0, (1- ) = ]ερ ρε σ σ

and t hen appl yi ng l eas t squar es t o t he t r ans f or med equat i on

y** = X** β + ε*

wher e

2

1

2 1

3 1

n n-1

1- y

- y y

- y y

.

.

.

- y y

ρ ρ

ρ ρ

= T2

y

47I V

ρρρ

ρρρ

ρρρ

ρρρ

x-x...x-x-1

. ..

. ..

. ..

x-x...x-x-1

x-x...x-x-1

x -1...x-1-1

= *X*

k 1n-nk2 1n-2n

k2k32232

k1k21222

k12

1222

= T2

X

=

21 0 0 0

1 0 0

0 1

0

0 0 1

− ρ

−ρ −ρ −ρ

L

L

O

M O O O

X.

The t r ansf or mat i on mat r i ces T1

and T2

ar e r el at ed by

2

2

1

1 0 0T

T

− ρ=

L

Not e: ( 1) T2

i s n x n wher eas T1

i s n- 1 x n; hence, y** i s n x 1 and y*

i s n- 1 x 1.

( 2) I f al l n obser vat i ons ar e used, t hen a pr ogr am must be used

whi ch suppr esses es t i mat i on of an i nt er cept . Thi s i s because

t he f i r s t col umn of X** cont ai ns di f f er ent el ement s .

( 3) I f onl y t he l as t n- 1 obser vat i ons ar e used, t hen a r egr ess i on

pr ogr am whi ch es t i mat es an i nt er cept can be used and t he

48I V es t i mat e of β

1 can be r ecover ed by di vi di ng t he es t i mat ed

i nt er cept by 1- ρ.

( 4) I n cases i n whi ch ρ i s known t he above pr ocedur es ar e

r el at i vel y s t r ai ght f or war d. When ρ i s not known al t er nat i ve

t echni ques have been devel oped. A common t echni que can be

out l i ned as f ol l ows:

( a) Es t i mat e y = Xβ + ε us i ng OLS t o obt ai n y = Xβ + e.

Obt ai n an es t i mat e of ρ us i ng t he e vect or .

e

)e e(

= ˆ2t

n

1=t

1t-t

n

2=t

∑

∑ •

ρ

( b) Tr ansf or m t he dat a us i ng ρ i ns t ead of ρ. T1

or T2

can be

used. St at a al l ows t he use of T1

or T2.

( c) Appl y l eas t squar es t o t he t r ansf or med dat a. The

associ at ed es t i mat or s ar e r ef er r ed t o as t wo s t age

es t i mat or s . ( Don' t conf use t hese es t i mat or s wi t h t wo

s t age l eas t squar es whi ch wi l l be di scussed l at er ) .

( d) Maxi mum l i kel i hood es t i mat or s can be obt ai ned by us i ng t he

es t i mat e of β det er mi ned i n t he l as t s t ep, β*; cal cul at e

t he associ at ed er r or t er ms e* = y - Xβ*; cal cul at e a new

es t i mat e of ρ i n t er ms of e*; t r ans f or m t he dat a ( y, X) ;

r ees t i mat e β; r epeat t hi s pr ocess unt i l conver gence i s

achi eved.

Thi s pr ocess , whi l e concept ual l y s i mpl e, woul d be t edi ous t o per f or m

by hand. The St at a, TSP, SAS and SHAZAM pr ogr ams have been wr i t t en t o

aut omat i cal l y per f or m t hi s i t er at i ve es t i mat i on pr ocedur e.

The St at a “MLE” es t i mat i on can be per f or med as f ol l ows:

• tsset “t ype i n t he name of a “t i me” var i abl e

49I V

• prais depvar_rhs_vars ( per f or ms i t er at i ve MLE us i ng T2

assumi ng an AR( 1) model )

• prais depvar rhs_vars, corc ( per f or ms i t er at i ve “MLE”

us i ng T1 assumi ng an AR( 1) model )

• prais depvar rhs_vars, twostep ( s t ops t he pr ai s es t i mat i on

af t er t he f i r s t s t ep)

4. Unit roots and the Dickey-Fuller test

I n our di scuss i on of es t i mat i ng r egr ess i on model s wi t h aut ocor r el at ed

di s t ur bances we not ed t hat t he t r ansf or med r egr ess i on model wi t h an AR( 1)

er r or ,

( )1 1t t t t ty y X X uρ ρ β− −− = − + ,

was char act er i zed by uncor r el at ed er r or s . Not e t hat t hi s model s i mpl i f i es t o

t he r egul ar r egr ess i on model wher e 0ρ = wi t h OLS yi el di ng ef f i ci ent es t i mat or s .

I n t he pr evi ous sect i on we di scussed sever al t es t s of t he hypot hes i s

0 : 0H ρ = and how MLE can be obt ai ned when t he nul l

hypot hes i s i s r ej ect ed.

Anot her hypot hes i s of i nt er es t i s 0 : 1H ρ = t o check f or what ar e r ef er r ed t o

as uni t r oot s . Not e

i n t hi s case t he t r ansf or med equat i on becomes

( )1 1t t t t ty y X X uβ− −− = − + ,

wi t h t he cor r espondi ng es t i mat i on i nvol vi ng r egr ess i ng changes i n y on changes

i n x. Regul ar t -

t es t s can’t be used t o t es t f or uni t r oot s . The Di ckey- Ful l er t es t i s

des i gned f or t hi s case. Si mpl e

Di ckey- Ful l er t es t s can be per f or med by es t i mat i ng t he f ol l owi ng equat i ons and

t es t i ng f or

s t at i s t i cal s i gni f i cance of t he es t i mat ed θ :

50I V

( )1 1 1

1 1

1 =

t t t t t

t t t t

y y y u y or

y y t y u

α ρ α θ

α δ θ

− − −

− −

− = + − + +

− = + + +,

The nul l hypot hes i s 0 : 1H ρ = i s r ej ect ed i f θ ’s t - s t at i s t i c i s l ess t han t he

cr i t i cal val ues r epor t ed i n

t he f ol l owi ng t abl es , r espect i vel y,

Si gni f i cance

l evel

1% 2. 5% 5% 10%

Cr i t i cal val ue - 3. 43 - 3. 12 - 2. 86 - 2. 57

Si gni f i cance

l evel

1% 2. 5% 5% 10%

Cr i t i cal val ue - 3. 96 - 3. 66 - 3. 41 - 3. 12

5. Predictions

The expr ess i on obt ai ned by Gol dber ger f or t he bes t l i near unbi ased

pr edi ct or s i n t he case of AR( 1) er r or t er ms i s

yn+h

= Xn+h

∆

β+ W' Σ

- 1e

wher e

n h 1

n+h 2

u

2

h

n+h

W E1

+ − ε ε ρ σ ′ = = − ρ ε ε ρ

M M

2

1

2

u 2

1 0 0 0 0

1 0 0 01

0 0 0 1

0 0 0 0 e 1

−

−ρ

−ρ + ρ −ρ Σ =

σ −ρ + ρ −ρ

−

L

L

M

L

L

51I V Ther ef or e,

yn+h

= Xn+h

∆

β + ρh

en

Thi s mi ght gr aphi cal l y be depi ct ed as :

Not e t hat as we at t empt t o f or ecas t f ur t her i nt o t he f ut ur e, t he

adj us t ment f act or s , ρh

en

, appr oaches zer o and yn+h

appr oaches Xn+h

∆

β as

h → ∞ .

Xn Xn+1

tX∆

β

en

n 1 nˆ ˆe e+ = ρ

Xt

pr edi ct ed

val ue

52I V

V. G. Panel Data: an introduction

Panel dat a r ef er s obser vat i onal dat a on i ndi vi dual s ( i , i = 1, 2, . . . m)

over t i me ( t =1, 2, . . , iT ) ( t wo di mens i ons) and mi ght be denot ed as ( )itY .

The panel dat a set i s r ef er r ed t o as bal anced i f ever y i ndi vi dual i s

obser ved f or ever y poi nt of t i me,1 2 . . . mT T T T= = = = . Ot her wi se, t he

panel dat a set i s r ef er r ed t o as unbal anced. Obser vat i ons f or a gi ven

i ndi vi dual over t i me ar e t i me ser i es ; wher eas , cr oss sect i onal dat a ar e

obser vat i ons f or di f f er ent i ndi vi dual s at a gi ven poi nt i n t i me. I n many

appl i cat i ons , t he dat a ar e f or shor t per i ods of t i me, but i ncl ude many

i ndi vi dual s .

1. OLS and GLS (generalized least squares)

Model s f or panel dat a t ake a number of di f f er ent f or ms . Per haps t he

s i mpl es t r epr esent at i on i s gi ven by

it it itY X β ε= + (1)

wher e itX denotes a 1xk vect or of obser vat i ons on k- exogenous var i abl es

f or t he thi i ndi vi dual at t he

tht t i me per i od and wher e t he mar gi nal

i mpact of t he X’s on Y i s assumed cons t ant over i ndi vi dual s and t i me

( i ncl udi ng t he i nt er cept ) . Thi s speci f i cat i on i s somet i mes cal l ed t he

pooled model. Let t he model be r ewr i t t en i n mat r i x f or m as

1 1 1

2 2 2

. . .

. . .

m m m

y X

y X

y X

ε

ε

β

ε

= +

53I V

or

Y X β ε= +

OLS es t i mat es of β , ( )1ˆ ' 'X X X Yβ

−= , can be obt ai ned wi t h t he

command

reg y x’s or

reg y x’s, vce(robust, bootstrap, or jackknife)

Recal l , t hat i n t he pr esence of het er oskedas t i ci t y and/ or aut ocor r el t i on

GLS ( gener al i zed l eas t squar es es t i mat or s ) can pr ovi de mor e ef f i ci ent

es t i mat or s t han OLS. The f or mul as f or t he GLS es t i mat or s and

cor r espondi ng var i ance- covar i ance mat r i x ar e gi ven by

( )

( ) ( )

11 1

11

' '

'

X X X Y

Var X X

β

β

−− −

−−

= Ω Ω

= Ω

%

%

wher e ( ) V ar ε = Ω , i imxm T xT

IΩ = Σ ⊗ , iT m≥ .

I n or der t o obt ai n GLS ( gener al i zed l eas t squar es ) es t i mat or s ,

s i mpl i f yi ng assumpt i ons about t he var i ance of , ,ε Ω need t o be made and

t he nat ur e of t he l ongi t udi nal / panel dat a mus t be pr ovi ded t o St at a wi t h

t he “xtset” command as f ol l ows:

xtset panel_var or

xtset panel_var time_var

t o i ndi cat e t hat panel dat a ar e bei ng used wher e panel_var denot es t he

i ndi vi dual i dent i f i cat i on code or gr oup var i abl e and time_var i s an

i ndex whi ch r epr esent s t he t i me var i abl e whi ch def i nes t he panel s bei ng

used. Thi s i s s i mi l ar t o us i ng “t sset time_variable” t o al er t St at a t hat

t i me ser i es ar e bei ng used.

The “xt gl s” command can be used t o obt ai n var i ous gener al i zed

54I V

l eas t squar es es t i mat or s of β , dependi ng on t he f or m of t he var i ance-

covar i ance of t he er r or t er m.

I f t her e i s het er oskedas t i ci t y acr oss panel s ,

2

1

2

2

2

0 . . . 0

0 . . . 0

. . . .

. . . .

. . . .

0 0 . . .m

I

I

I

σ

σ

σ

Ω = ,

cor r espondi ng GLS es t i mat or s can be obt ai ned us i ng t he command

xtgls y x’s, panels(hetero)

I f t her e i s cor r el at i on acr oss panel s ( cr oss - sect i onal cor r el at i on) of

t he f or m

2

1 1,2 1,

2

2,1 2 2,

2

,1 ,2

. . .

. . .

. . . .

. . . .

. . . .

. . .

m

m

m m m

I I I

I I I

I I I

σ σ σ

σ σ σ

σ σ σ

Ω = ,

t he GLS es t i mat or i s obt ai ned wi t h t he command ( t hi s can onl y be appl i ed

t o bal anced panel s )

xtgls y x’s, panels(correlated)

The command

xtgls y x’s, igls

i t er at es t he gener al i zed l eas t squar es pr ocedur e unt i l conver gence i s

55I V

obt ai ned.

St at a al l ows f or aut ocor r el at i on wi t hi n t he panel s . The St at a

manual , ( Logni t udi nal / Panel Dat a, ver s i on 10, p. 150) s t at es t hat t hr ee

opt i ons ar e al l owed: ” cor r ( i ndependent ) or no aut ocor r el at i on, cor r ( ar 1)

( ser i al cor r el at i on wher e t he cor r el at i on par amet er i s common f or al l

panel s ) , or cor r ( psar 1) ( ser i al cor r el at i on wher e t he cor r el at i on

par amet er i s uni que f or each panel ) . ” A coupl e of obser vat i ons ar e i n

or der : ( 1) xt gl s y X’s , panel s ( i i d) cor r ( i ndependent ) i s equi val ent t o

r egr ess y X’s ; ( 2) when cor r ( ar 1) or cor r ( psar 1) ar e speci f i ed t he

i t er at ed GLS es t i mat or does not conver ge t o t he MLE.

Some exampl es and var i at i ons i ncl ude:

xtgls y x’s, panel(hetero)

xtgls y x’s, panels(correlated)

xtgls y x’s, panels(correlated) igls

xtgls y x’s, panels(hetero) corr(ar1)

xtgls y x’s,panels(iid) corr(psar1)

Testing for heteroskedasticity.

A l i kel i hood r at i o t es t f or het er oskedas t i ci t y acr oss panel s can be

per f or med by compar i ng t he l og- l i kel i hood val ues of MLE of t he

r egr ess i on model wi t h and wi t hout het er oskedas t i ci t y as f ol l ows:

xt gl s y x’s , i gl s panel s ( het er o)

es t i mat es s t or e het er o

xt gl s y x’s

l ocal df =e( N_m) - 1 ( t he number of panel s or gr oups –

1)

l r t es t het er o . , df ( ` df ’)

Testing for autocorrelation.

Wool dr i dge ( Economet r i c Anal ys i s of Cr oss Sect i on and Panel Dat a,

2002, 282- 283) out l i nes a t es t f or aut ocor r el at i on i n panel - dat a model s .

Davi d Dr ukker has wr i t t en a downl oadabl e pr ogr am t o per f or m t o per f or m

56I V

t hi s t es t .

findit xtserial

net sj 3-2 st0039 (or click on st0039)

net install st0039 (or click on click here to install)

xtserial y x’s

The underlying null hypothesis is no autocorrelation, so a significant value of the

test statistic provides evidence of autocorrelation.

2. Fixed and random effects specifications

The f i xed and r andom ef f ect s r epr esent at i ons ar e a l i t t l e di f f er ent

t han t he f or m j us t cons i der ed i n t hat t hey al l ow panel s t o have

di f f er ent i nt er cept s . I n par t i cul ar , t hey can be r epr esent ed as :

it it i itY X= β + α + ε

( 2)

wher e t he mar gi nal i mpact of changes i n t he X’s ar e s t i l l assumed t o be

cons t ant acr oss i ndi vi dual s , i . e. t he β ‘s ar e t he same f or each

i ndi vi dual . The onl y di f f er ence i n t he r el at i onshi p acr oss f i r ms i s i n

t he i nt er cept t er m. I n f i xed ef f ect s ( f e) model s t he iα ar e unknown

cons t ant s and i n r andom ef f ect s model s ( r e) model s t he iα ar e r andom.

OLS can be used t o es t i mat e t he unknown par amet er s i n t he f i xed ef f ect s

f or m wi t h bi nar y var i abl es bei ng added t o t he set of exogenous var i abl es

t o denot e t he i ndi vi dual .

St at a uses a s l i ght var i at i on on t hi s f or mul at i on i n es t i mat i on

i ivα α= +

57I V

wher e t he iv ar e es t i mat ed such t hat 0

i

i

v =∑ ;

hence, it it i itY X =α+ β+ ν +ε .

( 3)

Cons i der t aki ng t he f ol l owi ng aver ages of ( 3) :

i i i i (4) (average over i)y = x

y = x (5) (average over i & t)

α + β + ν + ε

α + β + ν + εw

Combi ni ng equat i ons ( 3) and ( 4) , ( 3) , ( 4) and ( 5) , r espect i vel y,

enabl es us t o wr i t e

( ) ( )i i iit it ity x Y X

− − − ε= β+ ε ( 6)

( ) ( )i i iit it i ity x (7) Y y X x

− + − + − ε += α + β + ν + ε ν + εw

STATA’s fixed effects ( within) es t i mat i on pr ocedur e, xtreg y x’s, fe,

cor r esponds t o es t i mat i ng β i n equat i on ( 6) or equat i on ( 7) as

addi ng i n t he over al l mean of y has no i mpact on t he es t i mat es of β .

Thr ee 2 'R s ar e r epor t ed:

Within: 2R f r om t he mean- devi at i on r egr ess i on, equat i on ( 6)

2 2ˆ( , )Between i i

R corr x yβ= , 2R f r om r egr ess i ng i on x

iy

Overall: 2 2 2ˆ ( , )Overall it it

R corr x yβ= , 2R f r om r egr ess i ng

on it it

y X , pool ed

r egr ess i on

Leas t squar es es t i mat i on wi t h a dummy var i abl e ( LSDV) f or t he

di f f er ent i nt er cept s i s equi val ent t o r unni ng a f i xed ef f ect s

58I V r egr ess i on. The hypot hes i s t hat t her e i s no het er ogenei t y i n t he f i xed

ef f ect s or t hat t he gr ouped ef f ect s ar e al l t he same, ( )0, all ii

forν = ,

can be t es t ed us i ng a Chow Tes t by compar i ng t he pool ed and LSDV

r egr ess i ons as f ol l ows:

( )( )

( ) ( )

2 2

LSDV Pooled

2

LSDV

R R m 1F m 1 mT m K

1 R mT m K

/( ),

/

− − − − − =

− − −

wher e m = number of gr oups and T = l engt h of t i me ser i es .

St at a’s between effects es t i mat or s can be obt ai ned by es t i mat i ng

equat i on ( 4) us i ng t he St at a command, xtreg y x’s, be. The same t he 2 'R s

r epor t ed wi t h f i xed ef f ect s es t i mat i on ar e r epor t ed f or t he bet ween

ef f ect s wi t h t he 2

BetweenR cor r espondi ng t o t he f i t t ed model wi t h t hi s

es t i mat i on pr ocedur e.

I n t he random effects model t he i

ν i n t he r egr ess i on model

it it i it

y Xα β ν ε= + + +

ar e assumed t o be di s t r i but ed i dent i cal l y and i ndependent l y wi t h mean

zer o and cons t ant var i ance. The t er m ( )i itν ε+ can be t hought of as a

compos i t e er r or t er m wi t h

( )2 2

.( ) = and Var +i

i i T u T T mVar I i i Iεα ε σ σ α ε+ = + Σ = Ω = ⊗ Σ

GLS i s t hen appl i ed t o obt ai n t he des i r ed es t i mat or s us i ng t he command,

xtreg y x’s, re.

I f t he i

ν ar e uncor r el at ed wi t h t he expl anat or y var i abl es , t hen

r andom ef f ect s es t i mat or s wi l l be ef f i ci ent , ot her wi se t hey wi l l be

i ncons i s t ent .

The f i xed ef f ect s es t i mat or i s appr opr i at e whet her t he dat a ar e

gener at ed by a f i xed ef f ect s model or a r andom ef f ect s model ; however ,

59I V i t i s mer el y l ess ef f i ci ent t han t he r andom ef f ect s es t i mat or i f t he

dat a gener at i ng pr ocess i s a r andom ef f ect s model . However , i f t he dat a

gener at i ng pr ocess i s a f i xed ef f ect s model , r andom ef f ect s es t i mat or s

wi l l yi el d i ncons i s t ent es t i mat or s . A Hausman t es t can be used t o t es t

t he nul l hypot hes i s t hat t he dat a ar e gener at ed by a f i xed ef f ect s

model .

I n summar y, t he St at a commands f or es t i mat i ng f i xed ( wi t hi n) ,

bet ween, and r andom ef f ect s model s , r espect i vel y, ar e gi ven by

xtset panel_var or xtset panel_var time_var

xtreg y x’s, fe

xtreg y x’s, be

xtreg y x’s, re

A Hausman test of the null hypothesis of fixed vs. random effects can be

performed using the commands:

xtreg y x’s, fe

est store fixed

xtreg y x’s, re

est store random

hausman fixed random

Some comments:

( 1) The command “xt r egar y x’s , r e or f e”can be used t o es t i mat e r andom

or f i xed ef f ect s ef f ect s model s when t he er r or t er m i s char act er i zed by

a f i r s t or der aut or egr ess i ve pr ocess .

( 2) Numer ous var i at i ons ar e poss i bl e, e. g. , cons i der

it it i t itY X =α+ β+ ν +γ +ε

whi ch al l ows f or cr oss - sect i onal ef f ect s and t i me cont r as t s .

( 3) xt sum [ var l i s t ] [ i f ] [ , i ( var name_i ) ] xt sum, i s a gener al i zat i on of

60I V summar i ze, r epor t s means and s t andar d devi at i ons f or cr oss - sect i onal

t i me- ser i es ( xt ) dat a; i t di f f er s f r om summar i ze i n t hat i t

decomposes t he s t andar d devi at i on i nt o bet ween and wi t hi n component s .

( 4) A speci al edi t i on of t he Jour nal of Economet r i cs ( edi t i t ed by

Bal t agi , Kel ej i an, and

Pr ucha( 140, 2007) f ocuses on an anal ys i s of spat i al l y dependent dat a

di scusses r el at ed i s sues of i dent i f i cat i on, es t i mat i on, and t es t i ng.

61I V

V. H. Stochastic Independent Variables

1. Introductory Remarks:

Whi l e t hi s assumpt i on i s l i s t ed l as t , i t may be t he mos t i mpor t ant of t he under l yi ng assumpt i ons because OLS es t i mat or s wi l l be bot h bi ased and i ncons i s t ent i f t he expl anat or y var i abl es ar e cor r el at ed wi t h t he er r or t er ms . Fur t her mor e, t hi s assumpt i on wi l l gener al l y be vi ol at ed i f t he speci f i ed model i ncl udes a r i ght hand s i de dependent var i abl e ( endogenous r egr essor ) whi ch i s qui t e common i n economi c model i ng. I n t hi s sect i on we wi l l cons i der a s i mpl e macr o model whi ch i ncl udes an endogenous r egr essor , i l l us t r at e how cons i s t ent es t i mat or s can be obt ai ned, and f i nal l y f or mal l y out l i ne why a cor r el at i on bet ween t he X’s and t he er r or s l eads t o bi ased and i ncons i s t ent es t i mat or s .

2. A simple example

The case of endogenous r egr essor s i s a common exampl e of s t ochas t i c r egr essor s i n economi c model s . For exampl e, cons i der t he s i mpl e macr oeconomi c s t r uct ur al model cons i s t i ng of a consumpt i on f unct i on and an account i ng i dent i t y:

Ct

= α + βYt

+ εt

Yt

= Ct

+ Zt

I n t hi s model , t he t wo dependent var i abl es ar e C and Y, t hus Y i s an endogenous r egr essor i n t he consumpt i on f unct i on. The OLS es t i mat or s of t he unknown par amet er s i n t he consumpt i on f unct i on ar e gi ven as f ol l ows:

α = C - β Y

( )( )

t t

2t

,( - Y)( - C)CYˆ = ( - Y)Y

Cov Y C

Var Yβ

∑=

∑

Sol vi ng t he s t r uct ur al model f or t he r educed f or m gi ves

ttt + + C Z

1- 1- 1-

α β ε=

β β β

tt t + + Y Z

1- 1- 1-

α β ε=

β β β

Not e: Yt

and εt

ar e not i ndependent s i nce cov ( Yt

, εt

) =β

σ

-1

2

as can

seen by not i ng

62I V

E ( ( Yt

- E( Yt

) ) ( εt

- E ( εt

) ) )

( )

ε

β

εt

t

-1 E =

2

2t= E( ) /1- = 0.

1-

σβ ≠ε

β

Fur t her mor e, we can show t hat

σσ

σβββ 22

Z

2

OLS +

)-(1 + = ˆplim

This is an example of the simultaneous equation problem where least squares

are biased and inconsistent.

3. Estimation, tests, and statistical inference

Sever al es t i mat i on appr oaches t o ci r cumvent i ng t hi s pr obl em ar e

avai l abl e and wi l l be di scussed i n mor e det ai l i n anot her sect i on.

Two common es t i mat or s whi ch yi el d cons i s t ent es t i mat or s ar e t wo- s t age l eas t squar es and i ns t r ument al var i abl es . The St at a f or mat f or t he

t wo s t age l eas t squar es es t i mat or i s

i vr egr ess 2s l s l hs_dep_var ( r hs_dep_var s=i ns t r ument s ) r hs_i nd_var s

wher e l hs_dep_var denot es t he l ef t hand s i de dependent var i abl e,

r hs_dep_var s t he r i ght hand s i de dependent var i abl es or endogenous

r egr essor s , and r hs_i nd_var i abl es r epr esent s t he r i ght hand s i de

i ndependent var i abl es . The i ns t r ument al var i abl es , or i ns t r ument s , ar e var i abl es whi ch ar e assumed t o be ( 1) cor r el at ed wi t h t he

endogenous r egr essor ( s ) and ( 2) i ndependent of t he er r or t er m. Ther e needs t o be at l eas t as many i ns t r ument s as endogenous r egr essor s .

An F or t - t es t can be appl i ed t o a r egr ess i on of t he endogenous r egr essor ( s ) on t he i ndependent var i abl es and i ns t r ument s t o t es t

whet her t he i ns t r ument al var i abl es ar e s i gni f i cant l y cor r el at ed wi t h t he endogenous r egr essor . Thi s can be per f or med wi t h St at a’s reg

command as reg rhs_dep_var instruments rhs_ind_vars

or by addi ng t he opt i on first t o t he ivregress as

I V 43 ivregress 2sls lhs_dep_var (rhs_dep_vars=instruments) rhs_ind_vars,first

A compar i son of t he i ns t r ument al var i abl es ( 2SLS) and OLS es t i mat es obt ai ned f r om t he command reg lhs_dep_var rhs_vars, pr ovi des t he bas i s f or t es t i ng whet her t he r i ght hand s i de endogenous var i abl e i s cor r el at ed wi t h t he er r or t er m. These t es t s can be i mpl ement ed us i ng ei t her a Hausman or Wool r i dge t es t as f ol l ows:

Hausman test: Es t i mat e t he equat i on us i ng OLS and 2s l s ( al t er nat i ves can be used) . Then check f or s t at i s t i cal di f f er ences bet ween t he t wo es t i mat or s us i ng a Hausman t es t .

r eg l hs_var r hs_var s

es t s t or e OLS i vr egr ess 2s l s l hs_dep_var ( r hs_dep_var s=i ns t r ument s )

r hs_i nd_var s

es t s t or e 2s l s

hausman 2s l s ol s

Wooldridge test: Regr ess t he r i ght hand s i de endogenous var i abl es

i n a r egr esson model on al l of t he exogenous var i abl es ( t hose i n t he r egr ess i on model and t he i ns t r ument al var i abl es ) and save t he

cor r espondi ng r es i dual s . Es t i mat e t he or i gi nal r egr ess i on model

wi t h t he es t i mat ed r es i dual s i ncl uded as r egr essor s . Tes t t he

s t at i s t i cal s i gni f i cance of t he coef f i ci ent s of t he r es i dual s .

The es t i mat ed coef f i ci ent s of t he or i gi nal var i abl es shoul d be i dent i cal t o t he 2SLS es t i mat es .

Appl yi ng t hese met hods t o t he s i mpl e consumpt i on f unct i on can be accompl i shed wi t h t he St at a commands

r eg c y OLS es t i mat es of t he consumpt i on f unct i on

es t s t or e OLS pr edi ct e, r es i d

i vr egr ess 2s l s c ( Y=z) 2s l s es t i mat es of t he consumpt i on f unct i on es t s t or e 2s l s

hausman 2s l s OLS Per f or ms a Hausman t es t r eg c y e Per f or ms a Wool dr i dge t es t

I V 44 The s t at i s t i cal s i gni f i cance of t he coef f i ci ent s of t he r es i dual s woul d be t es t ed us i ng a chi squar e, F or t - t es t . Not e t hese di s t r i but i ons ar e asympt ot i c and woul d not be expect ed t o be exact f or f i ni t e sampl es .

4. Formal analysis

Assumpt i on A. 5 i n t he s t andar d model s t at es :

( a) Xt

i s nons t ochas t i c.

( b) Val ues of X ar e f i xed i n r epeat ed sampl es .

( c) XXn

1(X, X) = limit

n→∞∑ i s f i ni t e and nons i ngul ar .

Assumpt i ons ( a- b) ar e pr i mar i l y of t heor et i cal i nt er es t s i nce, at

l eas t wi t h economi c dat a, we can r ar el y “dr aw” t he same set of X' s

or sel ect a pr edet er mi ned val ue f or X. These assumpt i ons , ( A. 5 a-

c) , pr ovi de a r el at i vel y s i mpl e bas i s t o begi n our anal ys i s of

r egr ess i on t heor y. Assumpt i on ( c) i s usef ul i n pr ovi ng cons i s t ency of L. S. es t i mat or s .

a. Case 1 of relaxing (A.5)

( A. 5) ' ( a) Xt

i s s t ochas t i c

( b) Xt

and εt

ar e s t ochas t i cal l y i ndependent .

© XXn

1(X, X) = limit


β = ( X’X)- 1

X’y = ( X’X)- 1

X’( Xβ + ε)

= β + ( X’X)- 1

X’ε

E( β ) = β + E( X’X)- 1

X’E( ε)

I V 45

= β, t her ef or e β i s unbi ased.

Var ( β ) = E( β - β) ( β - β) ’ = E( X’X)- 1

X’εε’X( X’X)- 1

= σ2

E( X’X)- 1

Rel axi ng t he assumpt i on t hat X i s nons t ochas t i c and r epl aci ng i t wi t h t he assumpt i on t hat X i s s t ochas t i c and i ndependent of ε does not al t er t he des i r abl e unbi asedness and cons i s t ency pr oper t i es of OLS.

b. Case 2 of relaxing (A.5)

( A. 5) ' ' ( a) Xt

i s s t ochas t i c.

( b) Xt

and εt

ar e s t ochas t i cal l y dependent and cov ( Xt

, εt

) ≠ 0

( c) XXn

1(X'X) = limit


E( β ) = β + E( X' X)- 1

X' ε

≠ β;

Ther ef or e, t he l eas t squar es es t i mat or i s biased.

pl i m ( β ) = β + pl i m ( X' X)- 1

X' ε

)Cov(X + = -1XX ε∑β

≠ β t her ef or e inconsistent.

Thus , i t i s t he cor r el at i on bet ween t he r egr essor s and er r or s

whi ch l eads t o es t i mat or bi as and i ncons i s t ency.

I V

51

IV. I. Errors of Measurement

An assumpt i on whi ch has been made i n t he devel opment t o t hi s poi nt i s

t hat t he i ndependent and dependent var i abl es cont ai ned i n our hypot hes i zed f or mul at i ons ar e measur ed wi t hout er r or . I n many cases t hi s i s ext r emel y unr eal i s t i c. I f t he i ndependent and dependent var i abl es ar e measur ed wi t h er r or , t hen t he l eas t squar es es t i mat or s need not possess t he des i r abl e s t at i s t i cal pr oper t i es di scussed ear l i er .

l. Theoretical Development

Assume t hat t he r el at i onshi p

( 1) y = Xβ + ε

wher e ε ~ N( 0, Σ = σ2I )

i s hypot hes i zed t o hol d wher e y and X r epr esent "t r ue" val ues .

Al so assume t hat y and X ar e measur ed wi t h er r or as y* and X*,

r espect i vel y, wher e

( 2. a) y* = y + u u ~ N( O, Σu

)

( 2. b) X* = X + V V ~ N( O, Σv

)

and t he measur ement er r or s u and V ar e i ndependent .

Maki ng use of ( 2) we can r ewr i t e ( 1) i n t er ms of observed variables, y*,

X*.

y* - u = ( X* - V) β + ε

( 3) y* = X*β + ε + u - Vβ

( 3) ' y* = X*β + η

wher e η = u - Vβ.

Appl yi ng l eas t squar es t echni ques t o ( 3) ' yi el ds

( 4) β = ( X*' X*)- 1

X*' y*

or

( 4) ' β = [ X' X + V' X + X' V + V' V]- 1

[ X' y + X' u + V' y + V' u]

I s t hi s es t i mat or unbi ased and cons i s t ent ?

I V

52

Fr om ( 4) ' we can wr i t e

-1X X V X X V V V X y X u V y V uˆ = + + + + + +

n n n n n n n n

′ ′ ′ ′ ′ ′ ′ ′ β

and s i nce X' y = X' Xβ + X' ε we can use Sl ut sky' s t heor em t o obt ai n

( ) ( )-1XX VX XV VV XX X Xu Vy Vu

n

ˆ = + + + + + + + plim ε→∞

β β∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

s i nce ΣXV = ΣVX = 0, ΣVy = ΣVu = 0, ΣXε = 0, and ΣXu = 0.

Note: (1) As long as the independent variables are measured with error (ΣΣΣΣvv

≠≠≠≠ 0) ,

the least squares estimator of β is inconsistent

( )

( )

-1

XX VV XX

11

= +

XX VVI

β

β−−

∑ ∑ ∑

= + Σ Σ

f( ββββi)

I V

53

(2) If the dependent variable is measured with error, but the independent

variables are "error free" (ΣΣΣΣVV

= 0), (3) can be rewritten as

y* = X*β + ε + u

η

wher e η wi l l sat i s f y ( A. 1) - ( A. 4) as l ong as ε and u do.

Not e: I n t hi s case n

plim→∞

β = β and, gi ven t he X' s , t he associ at ed

β wi l l be unbi ased, mi ni mum var i ance, ef f i ci ent , asympt ot i cal l y

unbi ased, cons i s t ent , and asympt ot i cal l y ef f i ci ent . I t shoul d be

not ed t hat t he var i ance of ηt

, σ2

η = σ2

ε + σ2

u wi l l be l ar ger t han i f

t he dependent var i abl e was measur ed wi t hout er r or .

2. An Example. M. Friedman suggested that consumption and income can be partitioned

into "permanent" and "transitory" components as follows:

c = c

p + c

T

y = yp

+ yT

He al so sugges t s t hat t he "per manent " consumpt i on f unct i on i s of t he f or m

cp

= kyp

+ εT

I f t he "per manent " mar gi nal pr opens i t y t o consume, k, i s es t i mat ed us i ng

l eas t squar es appl i ed t o c and y dat a, we have an exampl e of an er r or i n var i abl es model and our r esul t ant es t i mat e of k wi l l , i n t he l i mi t as

n→∞, pr ovi de an under es t i mat e of t he "t r ue" k.

)y + y(

)y + y)(c + c( =

y

cy = k

2

Tp

TpTp

2 ∑

∑

∑

∑

)y + y(

)y + y)(c + + ky(=

2

Tp

TpTTp

∑

ε∑

y + yy2 + y

yc + yc + y + y + )yy(k + yk=

2

TTp

2

p

TTpTTTpTTp

2

p

∑∑∑

∑∑ε∑ε∑∑∑

I V

54

p

p T

2y

2 2n y y

kˆ k = plim

+ →∞

σ

σ σ

wher e σ2

yp and σ2

yT r espect i vel y, denot e Var ( y

p) and Var ( y

T) .

3. Estimation. (ΣΣΣΣVV

≠≠≠≠ 0)

a. Met hod of i ns t r ument al var i abl es .

Sel ect z

t i' s whi ch ar e uncor r el at ed wi t h t he measur ement er r or s and

ar e cor r el at ed wi t h t he xt i

' s .

y = Xβ + ε

( )( ) ( )1

1 1

( )ˆ = ' ' ' ' ' '

ZX Z Z Z Z X X Z Z Z Z Yβ

−

− −

wi l l be a cons i s t ent es t i mat e of β

I V

55

IV. J. Specification Error

A speci f i cat i on er r or i s sai d t o have occur r ed whenever a r egr ess i on

equat i on or under l yi ng assumpt i on i s i ncor r ect . Speci f i cat i on er r or s can t ake many f or ms:

( 1) del et i ng a "r el evant " var i abl e,

( 2) i ncl udi ng an "i r r el evant " var i abl e,

( 3) us i ng an i ncor r ect f unct i onal f or m, or

( 4) speci f yi ng an i ncor r ect descr i pt i on of t he popul at i on f r om whi ch

t he di s t ur bance was dr awn.

For someone t o cl ai m t hat a speci f i cat i on er r or has been made car r i es

wi t h i t some sugges t i on t hat t he i ndi vi dual knows what t he "t r ue" model i s

l i ke. Speci f i cat i on er r or s i nvol vi ng ques t i ons about f unct i onal f or m or

t he er r or di s t r i but i on have al r eady been di scussed. We now cons i der t he consequence of ( 1) del et i ng a r el evant var i abl e and ( 2) i ncl udi ng an

i r r el evant var i abl e.

1. Example. Deletion of "relevant" variables

Tr ue Model : y

t = β

1 + β

2x

t 2 + . . . + β

k1

xt k

1 + . . . β

kx

t k + ε

t

( 1) y = XIβ

I + X

I Iβ

I I + ε

Hypot hes i zed Model :

( 2) y = X

Iβ

I + η [ Not e: η = ε + X

I Iβ

I I]

An appl i cat i on of l eas t squar es t o ( 2) yi el ds

( 3) βI

= [ XI

' XI

]- 1

XI

' y

Repl aci ng y i n ( 3) by ( 1) r esul t s i n t he f ol l owi ng expr ess i on f or βI

.

I V

56

( 4) βI

= [ XI '

XI

]- 1

XI

' [ XIβ

I + X

I Iβ

I I + ε]

= [ XI

' XI

]- 1

XI

' XIβ

I + [ X

I' X

I]

- 1 X

I' X

I Iβ

I I + [ X

I' X

I]

- 1 X

I' ε

= βI

+ ( XI

' XI

)- 1

XI

' XI Iβ

I I + [ X

I' X

I]

- 1 X

I' ε

Anal ys i s of t he pr oper t i es of t he l eas t squar es es t i mat or βI i n t he

mi sspeci f i ed model :

a. E( βI

) = βI

+ E[ XI

' XI

]- 1

XI

' XI Iβ

I I + E[ X

I' X

I]

- 1X

I' ε

I f XI

and ε ar e i ndependent , t hen

E( βI

) = βI

+ E[ XI

' XI

]- 1

XI

' XI Iβ

I I

i . e. , βI

i s a bi ased es t i mat or of βI i f X' IX' IIβII ≠ 0

b. ( )-1

I II III III

1X Xˆplim = + plim plim X Xn n

′′ β ββ

-1

I II

1X X+ plim plim X

n n

′′ ε

( E. 5) X XI II I II

-1 -1

I IIX X X= + + εβ β∑ ∑ ∑ ∑

2. Example. Including an irrelevant variable.

Tr ue Model : y = XIβ

I + η

Hypot hes i zed Model : y = X

Iβ

I + X

I Iβ

I I + ε

To summar i ze, del et i ng a r el evant var i abl e r esul t s i n an i ncons i s t ent es t i mat or of βI unl ess

a) IX 0ε =∑ ( ε and XI ar e i ndependent )

and

b) I IIX X 0=∑ ( XI and XII ar e or t hogonal )

I V

57

= Xβ + ε

The l eas t squar es es t i mat or of

β

ββ

II

I

= i s t hen gi ven by

I -1

II

ˆˆ = = (X X X y)

ˆ

β′ ′β

β

-1

II I I II

II I II II II

yXX X X X=

yX X X X X

′′ ′ ′ ′ ′

Taki ng expect ed val ues gi ves -1

I I I I II I

II II I II II II

ˆX X X X X

E = E(y)ˆ

X X X X X

′ ′ ′ β

′ ′ ′β

( )

-1

I I I II I II II

II I II II II

X X X X X= X X

0X X X X X

′ ′ ′ β ′ ′ ′

-1

I I I II I I I II 1

II I II II II I II II

X X X X X X X X=

0X X X X X X X X

′ ′ ′ ′ β ′ ′ ′ ′

β

0 =

1.

The r eason f or t he asymmet r y of t he r esul t s f or t he t wo cases of

speci f i cat i on er r or j us t cons i der ed i s t hat t he hypot hes i zed model i ncl udes

t he "t r ue" model as a speci al case i n t he second exampl e, but does not i n t he

f i r s t exampl e. I t woul d t hen appear t hat i t woul d be bet t er t o er r or i n t he

di r ect i on of i ncl udi ng t oo many var i abl es t han del et i ng a r el evant var i abl e.

Ther ef or e, i ncl udi ng i r r el evant var i abl es i n a l i near r egr ess i on does not af f ect t he unbi asedness nor t he cons i s t ency of t he l eas t squar es es t i mat or s .

I V

58

I t shoul d be ment i oned t hat whi l e t he l eas t squar e es t i mat or of βI i n t he

second exampl e i s unbi ased and cons i s t ent , t he cor r espondi ng var i ance may be

l ar ger t han i s associ at ed wi t h es t i mat i ng t he "t r ue" model us i ng l eas t

squar es .

V. K. PROBLEM SET 5

Violations of the Basic Assumptions

Theory

1. Di s t r i but i onal assumpt i ons

a. Assume t hat t he pr obabi l i t y dens i t y f unct i on of t he r andom di s t ur bances εt i n a r egr ess i on equat i on

Yt = Xtβ + εt i s gi ven by gener al i zed er r or

di s t r i but i on ( GED) :

pt-(| |/ )

t

e( ; , ) =

2 (1 + 1/p)GED p

σε

σεσΓ

wher e Γ( ) i s t he gamma

f unct i on.

( 1) Obt ai n an expr ess i on f or t he l i kel i hood f unct i on and al so f or

t he l og l i kel i hood f unct i on cor r espondi ng t o t he r egr ess i on model

wi t h a GED er r or di s t r i but i on.

( 2) What woul d t he MLE of β be i f p i n t he GED i s

( a) p=1

( b) p=2

Hi nt : You don’t have t o der i ve an equat i on f or β ; however , i n

maxi mi zi ng t he l og- l i kel i hood f unct i on over β f or a gi ven

val ue of p you shoul d get β ’s you have seen bef or e. What ar e

t hey?

( 3) Bonus : How coul d t he par amet er "p" be es t i mat ed?

b. For t he dat a, HBJ . dat , es t i mat e t he model t t tY X= α + β + ε , us i ng

OLS and LAE, and t es t t he di s t r i but i onal assumpt i on of

nor mal i t y, i n par t i cul ar :

I V

59

( 1) r epor t t he es t i mat ed i nt er cept and s l ope us i ng OLS and LAE; ( 2) t es t t he nor mal i t y assumpt i on us i ng t he es t i mat ed skewness ,

kur t os i s us i ng a “Z- s t at i s i t c; ” and ( 3) t es t t he nor mal i t y assumpt i on us i ng t he JB t es t . ( Hi nt : You

can use t he St at a command skt es t f or ( 2) and ( 3) . )

2. I t was shown i n ( I V. C) t hat

µXX)X(β)β(E -1 ′′+=

wher e µ = E( ε) . I t was al so ment i oned t hat i f E( εt) = µ f or al l t t hen

E( β 1) = µ + β1 and E( β i) = βi f or i = 2, 3, . . . , K. Ver i f y t hat t hi s i s

t r ue f or t he case K = 2. Hi nt :

µ

′β

1

.

.

.

1

X)X(X, = )ˆ( Bias1-

and

XN - XN

1

NXN-

XN-X = )XX(

222

2t

1-

∑

∑′

.XN

N =

X

N =

1

.

.

.

1

Xt

∑

′

I V

60

3. Cons i der t he speci al case of t he gener al i zed r egr ess i on model wher e Σ = σ2I . For t hi s case, demons t r at e t hat

a. -1-1 -1 = = (X X X Y)

∆

′ ′β β ∑ ∑% s i mpl i f i es t o β = ( X' X)- 1

X' Y ,

b. Var ( β ) = ( X' X)- 1

X' Σ X( X' X)- 1

= σ2( X' X)- 1

, and

c.-1 -1-1 2Var( ) = Var( ) = (X X = (X X) )

∆

′ ′β β ∑ σ%

I V

61

4. Het er oskedas t i ci t y

a. Us i ng t he HBJ dat a and t he mar ket model t t tY X= α + β + ε

( 1) Tes t f or het er oskedas t i ci t y us i ng t he f ol l owi ng St at a

commands :

. whi t et s t . es t at het t es t x, i i d or es t at het t es t r hs , i i d . es t at het t es t x, f s t at

( 2) Wi t h t he wei ght s di scussed i n cl ass , use var i ance wei ght ed

l eas t squar es ( vwls) t o es t i mat e α and β. Tur n i n your comput er

commands and out put al ong wi t h your di scuss i on of t he r esul t s .

b. For t he het er oskedas t i c case ver i f y t hat

T' T = Σ- 1

.

5. For t he case of f i r s t or der aut ocor r el at i on i t can be shown t hat

ρ

ρ−

ρ+ρ

ρρ+ρ

ρ−

σ=Σ−

1

-

0

0

0

0

0

0 0 0 0

1-0

-1-

01

12

2

2

u

1

M

OOM

O

L

L

Eval uat e T1' T1 and T2' T2 and compar e each r esul t wi t h Σ- 1

comment i ng on t he

r el at i onshi p and expl ai ni ng any di f f er ences . Ref er t o t he cl ass not es f or

t he def i ni t i ons of t he t r ans f or mat i on mat r i ces T1 and T2 . The Cochr an-

I V

62

Or cut t es t i mat or cor r esponds t o del et i ng t he f i r s t obser vat i on wher eas t he

Pr ai s - Wi ns t en ( PW) es t i mat or uses al l obser vat i ons .

I V

63

Applied

6. Use t he dat a i n PHI LLI PS. RAW t o answer t hese ques t i ons .

a. Us i ng t he ent i r e dat a set , es t i mat e t he s t at i c Phi l l i ps cur ve equat i on

t 0 1 t tuneminf = β + β + ε by OLS and r epor t t he r esul t s i n t he usual f or m.

b. Obt ai n t he OLS r es i dual s f r om par t ( a) and obt ai n t he ρ f r om

r egr ess i ng te on t 1e − . I s t her e s t r ong evi dence of aut ocor r el at i on? Al so

t es t f or t he pr esence of aut ocor r el at i on us i ng t he DW t es t s t at i s t i c.

c. Now es t i mat e t he s t at i c Phi l l i ps cur ve model by i t er at i ve Pr ai s -

Wi ns t en. Compar e t he es t i mat e of 1β wi t h t hat obt ai ned i n Tabl e 12. 2.

d. Rat her t han us i ng Pr ai s - Wi ns t en, use i t er at i ve Cochr ane- Or cut t . How

s i mi l ar ar e t he f i nal es t i mat es of ρ ? How s i mi l ar ar e t he PW and CO

es t i mat es of 1β ? ( Wool dr i dge, C. 12. 10)

7. Cos t s of Pr oduct i on

The f ol l owi ng dat a cor r espond t o ouput ( Q) and t ot al cos t s ( C) of

pr oduct i on.

Out put Tot al Cos t s ( $) 1 193

2 226 3 240

4 244 5 257

6 260 7 274 8 297

I V

64

9 350 10 420 a. Use OLS t o es t i mat e t he par amet er s i n t he r el at i onshi p

1 2t t tC Qβ β ε= + +

b. Per f or m a t es t t o see i f t he er r or t er ms ar e “cor r el at ed. ”

c. I ndi cat e how you can obt ai n mor e appr opr i at e es t i mat or s t han OLS

es t i mat or s of t he l i near equat i on i n ( a) . Show your wor k and pr ovi de mot i vat i on f or your appr oach. ( Be car ef ul ! ! ! ! ! )

8. Panel dat a exer ci se

Cons i der t he f ol l owi ng dat a:

t code x y d1 d2 d3 d4

1 1 0 - 5 1 0 0 0

2 1 8 23 1 0 0 0

3 1 14 44 1 0 0 0 4 2 10 29 0 1 0 0

5 2 16 26 0 1 0 0

6 3 4 17 0 0 1 0

7 3 11 17 0 0 1 0

8 3 5 31 0 0 1 0 9 4 18 50 0 0 0 1

10 4 5 26 0 0 0 1 11 4 2 17 0 0 0 1

Per f or m t he f ol l owi ng St at a commands and br i ef l y expl ai n t he cor r espondi ng out put s .

xt set code

r eg y x

r eg y x d1 d2 d3 xt r eg y x, f e

xt r eg y x, be xt r eg y x, r e

I V

65

9. Cons i der t he f ol l owi ng model :

( ) 1 2i i in wage educβ β ε= + +l

wher e wage and educ, r espect i vel y, denot e t he wage and educat i on l evel ( year s ) f or t he i t h i ndi vi dual .

a. Under what condi t i ons woul d you expect t he OLS es t i mat or s of t he

'i

sβ t o be unbi ased and cons i s t ent ? Def end your answer .

b. I f you t hi nk t hat t he wage r at e has an i mpact on educat i on as wel l as

educat i on i mpact i ng wages , wi l l t he OLS es t i mat or s be unbi ased and

cons i s t ent ? Def end your answer .

c. I f t her e i s an endogeni et y pr obl em i n t he model , expl ai n how you coul d

obt ai n cons i s t ent coef f i ci ent es t i mat or s .

d. Us i ng t he mr oz dat a ( mr oz. dt a) es t i mat e t he gi ven model us i ng OLS and i ns t r ument al var i abl es es t i mat or s ( wi t h mot her ’s educat i on as an i ns t r ument ) . Whi ch es t i mat e woul d you r ecommend? Use a Hausmann

t es t t o suppor t your answer .

V 1

James B. McDonald Brigham Young University 2/8/2010

VI. SIMULTANEOUS EQUATION MODELS

INTRODUCTION

There are several problems encountered with simultaneous equations models that which are

not generally associated with single equation models. These include (1) the identification

problem, (2) inconsistency of ordinary least squares (OLS) estimators, (3) questions about the

interpretation of structural parameters, and (4) the validity of the OLS "t statistics" associated

with structural coefficients.

To introduce these problems, we review two important papers. The paper on identification

by E. J. Working [1927, QJE] is considered in the first section. The work of Haavelmo [1947,

JASA] dealing with alternative methods of estimating the marginal propensity to consume is

described in the second section. The third section contains a brief summary.

1. STRUCTURAL AND REDUCED FORM REPRESENTATIONS,

IDENTIFICATION, AND INTERPRETATIONS OF COEFFICIENTS

Consider the problem of estimating the impact of an increase in the price of crude oil upon

the equilibrium price and quantity of gasoline. The corresponding increase in the price of

gasoline will depend upon several factors including the slope of the demand curve.

V 2This is illustrated by the following figure:

Figure 1

Assume that (Q

0, P

0) denotes the original equilibrium. Assume that the increase in the price of

crude oil results in the supply curve shifting from S1

to S2

. The associated change in P depends

upon the relevant demand schedule, with the more inelastic schedule being associated with the

larger price increases. This example clearly indicates the importance of estimating the slope of

the demand schedule to make predictions about the impact of changes in factor price upon the

equilibrium price.

Estimation of the slope of the demand curve might begin by collecting observations on (P,

Q), which might appear as in Figure 2.

V 3 P

•

• •

•

• •

Q

Figure 2 The reader would probably be tempted to draw a line through the points or perform a least

squares estimation on p = β1

- β2

Q in order to estimate the demand schedule. But how would we

estimate the demand curve if a plot of P and Q appeared as in Figure 3 rather than as in Figure 2?

P

• •

• •

• •

• •

• •

• •

Q

Figure 3

The data in Figure 3 appears to define a supply curve rather than a demand curve.

Alternatively, how could we estimate a demand curve if the data appeared as in Figure 4?

V 4 P

•

• • •

• •

•

•

• • •

• •

• •

Q

Figure 4

To answer this question, we need to recall that equilibrium price and quantity are

determined by supply and demand factors and not supply or demand alone. The observations

depicted in Figure 2 could have been generated by either of the following scenarios:

P P

Q Q

Figure 5

V 5If the demand curve is stable and the supply curve shifts, then the demand curve is "traced

out." If both curves shift, fitting a relationship to the observed (P,Q) would not correspond to the

underlying demand curve(s). Similarly, Figure 3 could correspond to a relatively stable supply

curve and a shifting demand curve or both curves shifting. Figure 4 would appear to correspond

to both curves shifting.

Consider the following model:

(1.1) Demand: Q = γ11

- β12

P + γ12

Y + εlt

(1.2) Supply: Q = γ21

+ β22

P - γ23

FC + ε2t

or equivalently,

t111 1212 tt

t t 221 2322t

10-1 - Q

+ + = 0Y0-1 P

FC

γ γβ ε γ −γβ ε

.

Equations (1.1) and (1.2) will be referred to as the structural model with Q and P as endogenous

(dependent) variables and income (Y) and factor costs (crude oil, FC) as exogenous

(independent) variables. In order to draw a demand curve or supply curve using (Q, P) as

coordinates, Y and FC must be fixed at some arbitrary level.

P

S (FC = 125)

D (Y = 100)

Q

Figure 6

V 6

A change in factor costs (income fixed) will shift the supply curve and “trace” the depicted

demand curve and a change in income (factor costs fixed) will shift the demand curve and “trace”

the depicted supply curve, et cet. paribus. It is interesting to observe that by including factor

costs (FC) in the supply equation and not the demand equation we are able to "identify" the

demand equation. Similarly, by including income (Y) in the demand equation and not in the

supply equation we are able to "identify" the supply equation. Hence, one way of "identifying" a

structural equation is by excluding variables from the equation we want to estimate that are

included in other structural equations. This is the general approach to the identification problem

developed by E. J. Working [1927]. A more formal development will be considered later.

We note from Figure 6 that for each level of factor costs and income there is a

corresponding equilibrium price and quantity determined by the intersection of the supply and

demand curves. If we solve the structural model for the explicit relationship between (P, Q) and

FC and Y we obtain

ε

ε

γγ

γγ

β

β

+

FC

Y

1

-0

0

1-

-1- - =

P

Q

2t

1t

t

t

2321

1211

22

12

1-

t

t (1.3a-c)

ε

ε

γγ

γγ

ββ

ββ +

FC

Y

1

-0

0

1-1

+

1 =

2t

1t

t

t

2321

12111222

2212

γγγγ

γβγβγβγβ

ββFC

Y

1

-

- +

+

1 =

t

t

23122111

2312122221121122

2212

ββ

εε

ββ

εβεβ

2212

2t1t

2212

2t121t22

+

-

+

+

+

V 7

η

η

πππ

πππ

2t

1t

t

t

232221

131211

+

FC

Y

1

=

Note: 0 < = +

- =

FC

Q 0, > =

+ =

Y

Q13

2212

231212

2212

1222π

ββ

γβ

∂

∂π

ββ

γβ

∂

∂

0 > = +

= FC

P 0, > =

+ =

Y

P23

2212

2322

2212

12π

ββ

γ

∂

∂π

ββ

γ

∂

∂

Equations (1.3a-c) are referred to as the reduced form equations for Q and P corresponding to the

structural model defined by (1.1) and (1.2). Note that each reduced form equation expresses the

equilibrium value (P or Q) as a function of the exogenous variables FC and Y.

To determine the impact of an increase in the price of crude oil upon the price of gasoline,

we employ the reduced form representation, i.e.,

0 > = +

= FC

P23

2212

23π

ββ

γ

∂

∂

which takes into account the slopes of the supply and demand curves as well as how far the

supply curve would shift in response to an increase in the price of crude oil. The

equilibrium quantity would also change according to

0. < = +

- =

FC

Q13

2212

2312π

ββ

γβ

∂

∂

The reader might wonder why

0 < - = FC

Q23

s

γ∂

∂

doesn't characterize the change in equilibrium quantity.

V 8The following figure will illustrate why the reduced form provides the necessary information.

P Q

← → -γ

23∆FC

Taking the partial derivative of the supply equation with respect to FC assumes that P is

fixed and hence merely represents the horizontal shift of the supply curve and not the change in

equilibrium quantity. The reduced form equation for Q expresses the equilibrium quantity as a

function of FC and Y and takes account of the increase in equilibrium price associated with an

increase in factor costs.

To summarize, the reduced form coefficients represent the change in equilibrium values

corresponding to changes in the predetermined or exogenous variables, i.e., the reduced form

coefficients are the multipliers. The structural coefficients represent slopes or shifts of structural

schedules in response to changes in predetermined or exogenous variables.

∆ββ

γβ

+

-

2212

2312 FC

V 9OPTIONAL EXERCISES: 1. The Asymptotic Bias of the OLS estimator of the slope for the demand curve is given by

FC)) (Y,COR - (1 + +

) + (22

232

221

211222

γσσ

σββ

εε

ε

where COR(Y, FC) = correlation between Y and FC.

(a) Mathematically analyze the impact of increases in σε2

2, γ

232

, and COR(Y, FC) upon

the asymptotic bias of β12

.

(b) Graphically analyze the impact of increases in σε2

2, γ

232

, and COR(Y, FC) upon the

"identifiability of β12

."

V 10

2. INCONSISTENCY OF STRUCTURAL ORDINARY LEAST SQUARES

ESTIMATORS, ALTERNATIVE ESTIMATORS, AND STATISTICAL

INFERENCE

Haavelmo [1947] considered the following simple macro model:

(2.1) Ct = α + βY

t + ε

t

(2.2) Yt = C

t + Z

t

where Yt, C

t, and Z

t (Z ≡ Y - C) respectively denote income, consumption and nonconsumption

expenditure.

The reduced form representation corresponding to (2.1) and (2.2) is given by

(2.3) Ct = π

11 + π

12Z

t + η

t

(2.4) Yt = π

21 + π

22Z

t + η

t

where (2.5a-e) ηt = ε

t/(1-β)

π11

= α/(1-β)

π12

= β/(1-β)

π21

= α/(1-β)

π22

= 1/(1-β)

Note that π12

and π22

correspond to the multipliers discussed in simple macroeconomics

models. Haavelmo's analysis of the simple model defined by (2.1) and (2.2) pointed out many

problems which are also associated with larger econometric models. For this reason we will

consider this model in detail.

V 11

Estimation. Past experience might suggest that the OLS estimator of β would have

desirable statistical properties if εt in (2.1) is not characterized by autocorrelation or

heteroskedasticity. The OLS estimator of β in (2.1) is defined by

(2.6) ( )( )2

,(Y- Y)(C- C)ˆ = (Y- Y)

Cov Y C

Var Yβ

∑=

∑

but from (2.3) and (2.4), we see that

(2.7) β

εεπ

-1

- + )Z(Z- = CC- 12

β

εε

β

β

-1

- + )Z(Z-

-1 =

and

(2.8) β

εεπ

-1

- + )Z(Z- = YY- 22

β

εε

β -1

- + )Z(Z-

-1

1 = ;

hence, after substituting (2.7) and (2.8) into (2.6), we can write

(2.9)

β

εε

β∑

β

εε

β

β

β

εε

β∑

β

-1

)-( +

)-(1

)Z(Z-

-1

)-( + )Z(Z-

-1

-1

)-( +

) - (1

)Z(Z-

= ˆ2

( ) 22

2 2 2

2 2

2 2 2

1 (Z- Z)( - ) ( - )(Z- Z + + )

(1- (1- (1-) ) )ˆ =

(Z- Z ( - )(Z- Z) ( -) ) + 2 +

(1- (1- (1-) ) )

β ε εβ ε ε

β β ββ

ε ε ε ε

β β β

+ ∑

∑

( )

2 2

2 2

(Z- Z /N + 1 (Z- Z)( - ) /N + ( - /N) )=

(Z- Z /N + ( - )(Z- Z) /N + ( - /N) )

β β ε ε ε ε

ε ε ε ε

∑ + ∑ ∑

∑ ∑ ∑.

Assuming that:

σ→∑ 2Z

2N

1=t

/N)Z(Z- as N → ∞,

0 /N)-)(Z(Z- N

1=t

→εε∑ as N → ∞, and

σ→εε∑ 22N

1=t

/N)-( as N → ∞,

gives us:

V 12

(2.10) N → ∞, σσ

σσβ→β

22Z

22Z

+

+ ˆ .

σσ

βσβ

22Z

2

+

)-(1 + = .

. Hence, we see from (2.10) that β is an inconsistent estimator of β with asymptotic bias equal

to the second term in (2.10)

σσ

βσ22

Z

2

+

)-(1.

This may seem like a surprising result in light of the apparent simplicity of the consumption

function. It may not be obvious which of the assumptions

(A.1) εt distributed normally

(A.2) E(εt) = 0 for all t

(A.3) Var(εt) = σ2 for all t

(A.4) E(εtεs) = 0 for t ≠ s

(A.5) Yt and ε

t are independent

are violated. But upon closer inspection (hint: see (2.4)) we note that

ε

β

εππε )(

-1 + Z + E = )YE( t

tt2221tt

= E(ε 2

t )/(1-β)

= σ2/(1-β) ≠ 0;

hence, (A.5) is violated and OLS estimators of the structural parameters α and β are biased and

inconsistent. In fact, this is typically the case when OLS is used to estimate structural

relationships which include endogenous variables on the right hand side of the structural

equation. Right hand side endogenous variables are commonly referred to as endogenous

regressors.

As another example, the asymptotic bias of the OLS estimator of β12

in (1.1) is given by

V 13

(2.11) FC))(Y,Corr-(1 + +

) + (22

232

221

211222

γσσ

σββ

εε

ε .

How can we obtain consistent estimators of the unknown structural

parameters?

Two stage least squares or an appropriate application of instrumental variables estimation

provides a solution. It is instructive to consider an alternative estimator first. Recall that the

ordinary least squares estimators of the reduced form equations (referred to as least squares no

restrictions, LSNR) will yield unbiased and consistent estimators of the πij

's which will be

denoted by ˆij

π . This observation provides the basis for obtaining consistent estimators of α and

β in the Haavelmo model. From (2.5 c,e) we note that

β = π12

/π22

hence, a consistent estimator of β can be obtained from

(2.12) β* = π12

/ π22

where )Z(Z-

)Z)(Z-C(C- = ˆ 212

∑

∑π

)Z(Z-

)Z)(Z-Y(Y- = ˆ 222

∑

∑π

or

(2.13) )Z)(Z-Y(Y-

)Z)(Z-C(C- = *∑

∑β

In order to verify the consistency of β* in (2.13) we replace (C- C ) and (Y- Y ) in (2.12) by (2.7)

and (2.8) to obtain

V 14

(2.14)

[ ]

[ ]

β

εε

β∑

β

εε

β

β∑

β

ZZ- -1

)-( + )Z(Z-

-1

1

ZZ- -1

- + )Z(Z-

)-(1

= *

2

2

(Z- Z) /N + ( - )(Z- Z) /N=

(Z- Z /N + ( - )(Z- Z) /N)

β∑ ∑ ε ε

Σ ∑ ε ε

Now as N → ∞

β* → β;

hence, β* is a consistent estimator and is obtained by obtaining consistent estimators of the

reduced form (LSNR) and then deducing corresponding estimates of structural coefficients. This

general method is referred to as indirect least squares (ILS), but it is not applicable for all

structural models.

The consistent estimator β* can also be obtained by replacing the dependent variable on the

right hand side of (2.1) by its predicted value (from the reduced form)

Y = π21

+ π22

Z

or Y - Y = π22

(Z- Z )

and then applying least squares to the resultant expression. More explicitly,

V 15

(2.15 a-e) )Y-Y(

)C)(C-Y-Y( = *

2∑

∑β

)Z(Z-

)C)(C-Z(Z-

ˆ

ˆ =

2222

22

∑

∑

π

π

2

22

1 (Z- Z)(C- C)=

(Z- Z)ˆ

∑

∑π

∑

∑

∑

∑

)Z(Z-

)C)(C-Z(Z-

)Z)(Z-Y(Y-

)Z(Z- =

2

2

)Z)(Z-Y(Y-

)C)(C-Z(Z- =∑

∑

which corresponds to (2.13). Compare (2.15 a) with (2.6) and note that the only difference is that

Y (predicted value) replaces Y in (2.6). The structural estimator, obtained by applying least

squares to the structural equation which has been modified by replacing the right hand dependent

variables by their reduced form predictions is referred to as two stage least squares (2SLS).

2SLS yields consistent estimators, and is applicable even when indirect least squares is not.

Another way of looking at the alternative estimator is obtained by comparing (2.6) and (2.15e).

Here we see that the difference is that the right hand side dependent variable Y in (2.6) is

replaced by Z (an instrumental variable) which is correlated with Y, but not with C; hence, these

estimators are sometimes referred to as instrumental variables estimators.

A numerical example: the Haavelmo data set (Haavelmo.dat).

Using the data provided by Haavelmo, the regular OLS estimates of the consumption

function given by

OLSC = 84.01 + .732Y

s ( β ) (14.55) (.030)

R2 = .971

s2 = 58.21.

V 16

The corresponding 2SLS estimates of the consumption function are given by

2SLSC = 113.1 + .672Y

(17.8) (.037)

s2 = 71.29.

The LSNR estimates of the reduced form equations are given by

C = 344.70 + 2.048Z

(16.48) (.341)

R2 = .668

Y = 344.70 + 3.048Z

(16.48) (.341)

R2 = .668

The reader should verify that the indirect least squares estimators are equal to the 2SLS.

However, except for pedagogical examples, the reader will apply 2SLS or instrumental variables

estimation directly and not use the two step procedure. Also, the two step procedure yields

incorrect standard errors.

CONFIDENCE INTERVALS. In determining confidence intervals for structural

parameters, the reader might be inclined to use the results associated with the OLS or 2SLS

estimates of the structural equation under consideration. As an example of this we compute

"95% confidence intervals for β (the MPC)."

(a) Based upon OLS: (t = 2.101)

β OLS ± tsβ

= (.732 ± 2.101(.0299))

= (.669, .795)

V 17

(b) Based upon 2SLS

β 2SLS ± tsβ

= (.672 ± 2.101(.0368))

= (.594, .748)

These confidence intervals are very different and one might ask which if either is appropriate. As

it turns out, neither is completely satisfactory since

s

-ˆ

β

ββ

is not exactly distributed as a t-statistic where β is obtained from the technique of OLS or 2SLS.

One way in which we can determine which (if either) of the previous confidence intervals is

closest is to note that

ij

ij ij

ˆ

- ˆ ~ t(n- 2);

sπ

π π

hence,

22

22 22/ 2 / 2

ˆ

- ˆ1- = Pr[- ]t t

sα α

π

π πα ≤ ≤

22 2222 / 2 22 22 / 2ˆ ˆ= Pr[ - + ]ˆ ˆt s t sα απ ππ π π≤ ≤

22 2222 / 2 22 / 2ˆ ˆ

1= Pr[ - + ]ˆ ˆt s t s

1-α απ ππ π

β≤ ≤

22 2222 / 2 22 / 2ˆ ˆ

1 1= Pr[1 - 1 - ]

- +ˆ ˆt s s tα απ π

βπ π

≤ ≤ .

Making the appropriate substitutions we obtain

(.57, .73)

which is much closer to the results obtained using two least squares than from OLS. One might

be inclined to conjecture that a reason for the poor performance of OLS confidence intervals is

due to the asymptotic bias of OLS estimator,

σσ

βσ22

2

+

)-(1.

It might be instructive to estimate the asymptotic bias. Doing so we obtain for OLS estimates of

σ2(s2=58.2), β( β =.732), σ 2

z (285.55); hence asymptotic bias ( β OLS) = .0454; for 2SLS estimates

V 18

of σ2(s2=71.29), β( β =.672), σ 2

z (285.55), asymptotic bias ( β OLS) = .0655. Note that the

difference between the OLS and 2SLS is (.732 - .672 = .06).

PREDICTIONS. In order to make predictions, one should use the reduced form

representation.

V 19

K2 ≥ G∆ - 1

3. A BRIEF OVERVIEW

The mathematical formulation of an economic model is generally referred to as the

structural representation. The structural equations in the structural representation will often

include endogenous regressors (endogenous variables on the right hand side) as well as

exogenous variables.

The reduced form representation corresponding to the structural representation is

characterized by separate equations expressing each dependent variable as a function of the

exogenous variables. The reduced form provides explicit expressions for the equilibrium for the

model, conditional on an arbitrary, but given, set of values for the exogenous variables. The

reduced form coefficients can be interpreted as "multipliers" and yield comparative static results.

The reduced form representation is usually the form used for obtaining forecasts from

econometric models.

After the econometrician is satisfied that a given econometric model is consistent with

relevant economic theory, it is important that each structural equation be identified.

Identification should be checked even before attempting to estimate the model. A necessary

condition (order condition) for a structural equation to be identified is that the number of

exogenous (predetermined) variables excluded (K2

) from a structural equation is at least as large

as the number of endogenous regressors (one less than the number of endogenous variables in the

equation being checked (G∆)),

.

If K2

is thought of as referring to instrumental variables, then the necessary condition for

identification is that there must be at least as many instrumental variables as endogenous

regressors. This condition must be satisfied for each structural equation. The values for K2

and

V 20

ivregress 2sls y1 X1 (Y2 Y3=X1 X2)

G∆ may vary from one equation to another. Identities do not contain unknown parameters and

need not be checked for identification.

OLS estimates of parameters in structural models are typically biased and inconsistent

with unreliable t-statistics. This is due to the correlation between the error and endogenous

regressor on the right hand side of the equation. Two stage least squares estimators (2SLS)

provide biased, but consistent estimators. They can also be viewed as instrumental variables

estimators.

The Stata command for 2SLS is

where Y = endogenous variables (y1 on lhs, y2 and y3 on the rhs),

X1 = exogenous variables in structural equation being estimated,

X2=Z = exogenous variables in the model, but excluded from the equation being

estimated. The variables in X2 are often called instruments. An alternative form for the two

stage estimators is given by

Example 1: See the problem set for some sample data

Demand: Q = γ11

- β12

P + γ12

Y + ε1t

Supply: Q = γ21

+ β22

P - γ23

FC + ε2t

ENDOGENOUS VARIABLES: Q, P

EXOGENOUS VARIABLES: Y, FC

(a) Identification

(1) Demand K2

= 1 FC is in the supply model, but

not in the demand equation

G∆ - 1 = 2 - 1 = 1 One endogenous regressor (P) in the

demand equation

ivregress 2sls y1 X1(Y2 Y3=X2)

V 21

(2) Supply K2

= 1 Y is in the demand model, but

not in the supply equation

G∆ - 1 = 2 - 1 = 1 One endogenous regressor (P) in the supply equation

Therefore K2

≥ G∆ - 1 is satisfied for the supply and demand equation.

(b) Estimation of the structural parameters (Stata commands)

(1) Demand

ivregress 2sls Q Y (P = FC) or ivregress 2sls Q Y (P=Y FC)

(2) Supply

ivregress 2sls Q FC (P = Y) or ivregress 2sls Q FC (P=Y FC)

(c) Estimation of the reduced form (Stata commands)

(1) Q Equation

reg Q Y FC

(2) P Equation

reg P Y FC

Example 2. Consider the Haavelmo model and data:

Ct = α + βY

t + ε

t

Yt = C

t + Z

t

(a) Identification

The exogenous variable Z is not included in the consumption function, but it is in the

identity.

(b) Estimation of the structural parameters (STATA commands)

ivregress 2sls c (Y=Z)

(c) Estimation of the reduced form parmaters (STATA commands)

reg c z

V 22

reg y z

The data used by Haavelmo is given

Y C Z

433 394 39

483 423 60

479 437 42

486 434 52

494 447 47

498 447 51

511 466 45

534 474 60

478 439 39

440 399 41

372 350 22

381 364 17

419 392 27

449 416 33

511 463 48

520 469 51

477 444 33

517 471 46

548 494 54

629 529 100

References

Haavelmo, T. "Methods of Measuring the Marginal Propensity to Consume," Journal of

American Statistical Association, 42(1947):105-122.

Working, E. "What Do Statistical Demand Curves Show?," Quarterly Journal of Economics,

41(1926):212-235.

V 23

4. PROBLEM SET 6: Simultaneous Equations

Consider the following Supply and Demand Model:

Demand: Qt = (

11 + ∃

12 P

t + (

12 Y

t + e

t1

Supply: Qt = (

21 + ∃

22 P

t + (

23 FC

t + e

t2

where Qt, P

t, Y

t and FC

t denote quantity, price, income and factor costs.

Observations on these variables are given by:

Pt 185 215 275 279 310 330 400 360 450 515

Qt 320 360 460 460 480 540 600 570 680 780

Yt 100 120 160 164 180 200 240 220 280 320

FCt 10 12 14 15 20 16 24 20 28 30

1. Express the reduced form representation in terms of the structural coefficients.

2. Determine which of the structural coefficients can be expressed in terms of the reduced

form coefficients and make this relationship explicit where possible.

3. Determine whether the supply and demand equations are identified. Check the order

(necessary) condition in your analysis.

4. Estimate the reduced form equations for P and Q using the technique of Least Squares

(LSNR). (Hint: In Stata, type reg q Y FC and reg p Y FC)

a) Test for the presence of autocorrelation.

b) Test for heteroskedasticity using the results from the “whitetst” or “hettest” commands

in STATA .

V 24

5. Estimate the supply and demand equations using OLS.

6. Estimate the supply and demand equations using 2SLS (“ivregress” in Stata).

7. Comment on the properties of the estimators associated with questions (5) and (6).

8. Indicate how you could test the following hypotheses and discuss any related problems.

a) ∃12

= -2

b) (12

= 0

c) Β12

= 2.5

d) Β12

= 0

9. What implication does Β22

= 0, the coefficient of FC in reduced from equation for P, have

with respect identification of any of the structural equations?

Download - Notes on Econometrics

Top Related