00000chen- linear regression analysis3

252
Multiple Regression Analysis y = 0  + 1  x 1  + 2  x 2  + . . .  p  x  p  + ε 1 2013/11/27 Chia-Hsin Chen

Upload: tommy-ngo

Post on 04-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 1/252

MultipleRegression Analysis

y = 0  + 1 x 1 + 2 x 2 + . . .  p x  p + ε 

12013/11/27 Chia-Hsin Chen

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 2/252

Assumptions of OLS

Estimator

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 3/252

Regression Residual

A regression residual, , is defined as the

difference between an observed y  value and its

corresponding predicted value:

  0 1 1 2 2ˆ ˆ ˆ ˆˆ   ˆ b b b b  

  k k  y y y x x x

 

2013/11/27

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 4/252

Assumptions of OLS Estimator

1) E(εi

) = 0 (unbiasedness)

2) Var(ε i) is constant (homoscedasticity)

3) Cov(ε i, ε  j) = 0 (independent error terms)

4) Cov(ε i,Xi) = 0 (error terms unrelated to X’s) ε i ~ iid (0 , 2)

Gauss-Markov Theorem: If these conditions hold,

OLS is the best linear unbiased estimator (BLUE).Additional Assumption: ε i’s are normally distributed. 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 5/252

3 illnesses in Regression

1) Multicollinearity: Strong relationship among

explanatory variables.

2) Heteroscedasticty: Changing variance.

3) Autocorrelated Error Terms: this is a

symptom of specification error.

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 6/252

Checking the Regression

Assumptions

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 7/252

Assumptions of the model:

1) linear conditional mean

2) constant variance (homoskedasticity), normal errors

3) independent error terms

So we should see:

 – a pattern of constant variation around a line

 – very few points more than 2 standard deviations

away from central linear relationship.

How can we be sure of this when using real data? Wemust perform some basic diagnostic procedures to

insure that the model holds

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 8/252

If the model assumptions are violated:

 – Prediction can be systematically biased

 – Standard errors and t-tests wrong

 – someone may be able to beat you with adifferent and better model

All of the assumptions of the model are

really statements about the regression errorterms (ε).

How can we detect violations of the model?

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 9/252

Example: Data Set 1

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 10/252

Example: Data Set 2

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 11/252

Example: Data Set 3

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 12/252

Example: Data Set 4

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 13/252

Residual Analysis

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 14/252

Properties of

Regression Residual

The mean of the residuals is equal to 0.

• This property follows from the fact that the

sum of the differences between the observed

y  values and their least squares predicted

values is equal to 0.

ˆResiduals 0   y y

ˆ y

2013/11/27

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 15/252

Properties of

Regression Residual

The standard deviation of the residuals is

equal to the standard deviation of the fitted

regression model.

• This property follows from the fact that the

sum of the squared residuals is equal to SSE,

which when divided by the error degrees of

freedom is equal to the variance of the fitted

regression model, s2.

2013/11/27

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 16/252

Properties of

Regression Residual

The square root of the variance is both the

standard deviation of the residuals and the

standard deviation of the regression model.

2 2

2

ˆResiduals SSE

Residuals SSE

1 1

 y y

 sn k n k  

2013/11/27

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 17/252

Regression Outlier

A regression outlier is a residual that is larger

than 3s (in absolute value).

2013/11/27

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 18/252

Heteroskedasticity

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 19/252

• Heteroskedasticity : the error terms do not all

have the same variance.

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 20/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 21/252

Consequences of Using OLS

when Heteroscedasticity

OLS estimation still gives unbiased coefficient

estimates, but they are no longer BLUE.

• This implies that if we still use OLS in the

presence of heteroscedasticity, our standarderrors could be inappropriate and hence any

inferences we make could be misleading.

• Whether the standard errors calculated using theusual formulae are too big or too small will

depend upon the form of the heteroscedasticity.

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 22/252

Detection of Heteroscedasticity

Graphical methods

Formal tests: There are many of them

• Goldfeld-Quandt (GQ) test• White’s test 

• …… 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 23/252

Graphical analysis of residuals

To check for homoscedasticity (constant variance):

 —  Produce a scatterplot of the standardized residualsagainst the fitted values.

 —  Produce a scatterplot of the standardized residuals

against each of the independent variables.

• If assumptions are satisfied, residuals should varyrandomly around zero and the spread of theresiduals should be about the same throughoutthe plot (no systematic patterns.)

2013/11/27

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 24/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 25/252

1.Plot of Residuals vs. Fitted Values

Useful for:

• detection of non-linear relationships

• detection of non-constant variances

What should this look like?

• 1. Residuals should be evenly distributed

around the mean

• 2. No relationship between the mean of the

residual and the level of fitted value

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 26/252

A key assumption is that the

regression model is a linear

function. This is not always true.This will show up even more

prominently in the residuals vs.

fitted plot… 

   r   e   s   i    d   u

   a    l   s

There should be no

relationship between the

average value of the

residuals and fitted (X)

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 27/252

• Heteroskedasticity (different variances)

The key is a systematic pattern of variation

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 28/252

   r   e   s   i    d   u   a    l   s

Heteroskedasticity: Examples

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 29/252

Homoscedasticity is probably violated if… 

The residuals seem to increase or decrease inaverage magnitude with the fitted values, it isan indication that the variance of the residuals

is not constant.

The points in the plot lie on a curve aroundzero, rather than fluctuating randomly.

A few points in the plot lie a long way fromthe rest of the points.

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 30/252

Residual Plot

for Functional Form

Add x 2 Term Correct Specification

 x  

e ^

 x  

e ^

2013/11/27

0

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 31/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 32/252

Residual Plot

for Independence

 x  

Not Independent Correct Specification

Plots reflect sequence data were collected.

e ^

 x  

e ^

2013/11/27

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 33/252

Testing the Normality

Assumption

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 34/252

Normality of residuals.

Normality is not required in order to obtain

unbiased estimates of the regression

coefficients. 

• But normality of the residuals necessary forsome of the other tests – for example the

Bresch Pagan test of heteroscedasticity, Durbin

Watson test of autocorrelation, etc.

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 35/252

Normality of residuals.

Non-normal residuals cause the following:• t-tests and other associated statistics may no

longer be t distributed

• Least squares estimates are extremely sensitiveto large εi and it may be possible to improve on

least squares

• The linear functional form may be incorrect andvarious transformations of the dependent

variable may be necessary.

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 36/252

Normality

The random errors need be regarded as a random

sample from a N(0,σ2) distribution, so we can

check this assumption by checking whether the

residuals might have come from a normaldistribution.

We should look at the standardized residuals

Options for looking at distribution:

• Histogram,

•  Normal plot of residuals

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 37/252

How can we detect departures from normality?

• Characteristics of the normal distribution : thin tails

and symmetry.• The most basic analysis would be to graph the

histogram of the standardized residuals.

Neither of these plots look particularly symmetric

Histogram 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 38/252

ExampleHistogram with a normal curve for recent 61 observations

of the monthly stock rate of return of Exxon

The histogram uses only 61

observations, whereas the normal

curve superimposed depicts the

histogram using infinitely many

observations. Therefore, sampling

errors should show up as gaps

between the two curves.

This is not effective for revealing a subtle but systematic

departure of the histogram from normality.

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 39/252

Normal Q-Q Plot of Residuals

A normal probability plot is found by plotting the residualsof the observed sample against the corresponding residualsof a standard normal distribution N (0,1)

1) The first step is to sort the data from the lowest to thehighest. Let n be the number of observations. Then, the

lowest observation, denoted as x(1) is the (1/n) th quantileof the data.

2) The next step is to determine for each observation thecorresponding quantile of the normal distribution that hasthe same mean and the standard deviation as the data. The

following Excel function is a convenient way to determinethe normal (i/(n+1)) th quantile, denoted as x’(i).

x’(i) = NORMINV(i/(n+1), sample mean, sample standard deviation).

N l Q Q l t

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 40/252

• If the plot shows a straight line, it is

reasonable to assume that the observedsample comes from a normal distribution.

• If the points deviate a lot from a straight line,

there is evidence against the assumption thatthe random errors are an independent samplefrom a normal distribution.

x’(i) 

i/(n+1)

Normal Q-Q plot

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 41/252

Normal Q-Q Plot

• qq-plots plot the quantiles of a variable

against the quantiles of a normal

distribution.

• qq plot is sensitive to non-normality

near the tails.

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 42/252

1) The data are arranged from smallest to largest.2) The percentile of each data value is determined.

3) the z-score of each data value is calculated.

4) z-scores are plotted against the percentiles ofdata values

Departures from this straight line indicatedepartures from normality.

PP-plot is sensitive to non-normality in the middlerange of data.

PP-plot graphs

PP-plot graphs

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 43/252

 

PP-plot graphs

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 44/252

Example

l

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 45/252

Example

Suppose simulate some data from a t distribution with only 3

degrees of freedom.

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 46/252

Jarque-Bera test of normality

3 4

3 4

( ) ( ) /;

( 1)( 2)

i i x x x x n  n 

S K  

n n    s s 

k

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 47/252

眾 數   平均 數中 位 數   x

 f ( x )

眾 數平 均 數  x中 位 數

( x )

中 位

 數

  =平

 均

 數

  =眾

 數

( x )

 x

skewness

Skewed to the right, SK>0

Skewed to the left, SK<0

3

3( )

( 1)( 2)

i x x n S n n    s 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 48/252

kurtosis

( X )

高窄峰

低闊峰

常態峰

4

4

( ) /i 

x x n K  

 

What can we do if we find evidence of

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 49/252

What can we do if we find evidence ofNon-Normality?

What is the pattern in the plot of residuals?Check alternative (non-linear) specifications that

are appropriate.

Deviations from normality could be due tooutliers.

 – Find the reasons for outliers.

 – Data error? Correct the entry. – If not data error, and there is a valid reason for that

observation, then could use a dummy variable for

that observation.

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 50/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 51/252

What should you do about outliers?

Investigate – data errors, changes in

measurement,

structural changes in the environment

Consider deleting only if you have a good reason.

What is an outlier?

 – an unusual observation which is not likely to reoccur.

 – If it is likely to reoccur, you will fool yourself bydeletion. You will think that the model fits and

predicts better than it really does.

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 52/252

 

St i

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 53/252

Steps in a

Residual Analysis

1. Check for a misspecified model by plotting

the residuals against each of the quantitative

independent variables.

• Analyze each plot, looking for a curvilinear 

trend. This shape signals the need for a

quadratic term in the model. Try a second-

order term in the variable against which theresiduals are plotted.

2013/11/27

St i R id l A l i

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 54/252

Steps in a Residual Analysis

2. Examine the residual plots for outliers. Draw

lines on the residual plots at 2- and 3-standard-deviation distances below andabove the 0 line.

• Examine residuals outside the 3-standard-deviation lines as potential outliers and checkto see that no more than 5% of the residualsexceed the 2-standard-deviation lines.

2013/11/27

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 55/252

St i

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 56/252

Steps in a

Residual Analysis

3. Check for nonnormal errors by plotting afrequency distribution of the residuals, usinga stem-and-leaf display or a histogram.

• Check to see if obvious departures fromnormality exist. Extreme skewness of thefrequency distribution may be due to outliers 

or could indicate the need for atransformation of the dependent variable.

2013/11/27

Steps in a

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 57/252

Steps in a

Residual Analysis

4. Check for unequal error variances by plottingthe residuals against the predicted values, .If you detect a cone-shaped pattern or some

other pattern that indicates that the varianceof   is not constant, refit the model using anappropriate variance-stabilizing transformation on y , such as ln (y ). (Consultthe references for other useful variance-stabilizing transformations.)

2013/11/27

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 58/252

Goldfeld-Quandt (GQ) test

The Goldfeld-Quandt (GQ) test is carried outas follows.

1) Split the total sample of length T into two

sub-samples of length T1 and T2. Theregression model is estimated on each sub-sample and the two residual variances arecalculated.

2) The null hypothesis is that the variances ofthe disturbances are equal, H0: 2

2

2

1       

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 59/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 60/252

• White’s general test for heteroscedasticity is oneof the best approaches because it makes fewassumptions about the form of theheteroscedasticity.

• The test is carried out as follows:1) Assume that the regression we carried out is as

follows y t  = 1 + 2x 2t  + 3x 3t  + u t  

And we want to test H0: Var(u t ) = 2

.We estimate the model, obtaining the residuals .

White’s Test: Detection of Heteroscedasticity

ut 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 61/252

2) Then run the auxiliary regression

3) Obtain R2 from the auxiliary regression and multiply it

by the number T of observations. It can be shown thatT R2  2 (m)

where m is the number of regressors in the auxiliaryregression excluding the constant term.

4) If the 2

test statistic from step 3 is greater than thecorresponding value from the statistical table, then rejectthe null hypothesis that the disturbances arehomoscedastic.

t t t t t t t t    v x x x x x xu   326

2

35

2

2433221

2ˆ         

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 62/252

How Do we Deal with

Heteroscedasticity?

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 63/252

Generalised Least Squares (GLS) 

If the form (i.e. the cause) of the heteroscedasticity is

known, then we can use an estimation method

which takes this into account (called generalised

least squares, GLS).

G li d L t S (GLS)

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 64/252

Generalised Least Squares (GLS)

A simple illustration of GLS is as follows: Suppose

that the error variance is related to another variable

zt by

• To remove the heteroscedasticity, divide the

regression equation by zt. Than

where is the error term.

So the disturbances from the new regression

equation will be homoscedastic.

22var  t t    z u    

t t 

t  v z 

 x

 z 

 x

 z  z 

 y 3

32

21

1 b  b  b 

t t 

 z 

uv  

  2

2

22

2

var var var     

 

 

  

 

t t 

 z 

 z 

 z 

u

 z 

uv

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 65/252

Other Correcting for Heteroskedasticity

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 66/252

Other Correcting for Heteroskedasticity 

Use White’s heteroscedasticity consistentstandard error estimates.

• The effect of using White’s correction is that in

general the standard errors for the slopecoefficients are increased relative to the usualOLS standard errors.

• This makes us more “conservative” in hypothesis

testing, so that we would need more evidenceagainst the null hypothesis before we wouldreject it.

E l

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 67/252

Example

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 68/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 69/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 70/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 71/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 72/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 73/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 74/252

Reasons for the Log Transform

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 75/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 76/252

1

Y  X  

Y    b 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 77/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 78/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 79/252

Multicolinear

 

Multicollinearity

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 80/252

Multicollinearity

This problem occurs when the explanatory variables are very

highly correlated with each other. Perfect multicollinearity

•   Cannot estimate all the coefficients

e.g. suppose x3 = 2 x2 

and the model is yt  =  b 1 +  b 2 x2t  +  b 3 x3t  +  b 4 x4t  + ut   Problems if Near Multicollinearity is Present but Ignored

•  R2 will be high but the individual coefficients will have highstandard errors.

• The regression becomes very sensitive to small changes in thespecification.

• Thus confidence intervals for the parameters will be very wide,and significance tests might therefore give inappropriateconclusions.

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 81/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 82/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 83/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 84/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 85/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 86/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 87/252

1 1

1 1

1 1

1 2 1 2 1

ˆ ( ) ( ) ( )

( ) [( ( ))( ( )) ] [( )( ) ]

  [( ) ' ( ) ]

  ( ) ( ') ( )

  ( ) ( ) ( )

 b b 

 b b 

 

b X X X Y X X X X e

Cov b E b E b b E b E b b

 E X X X ee X X X 

 X X X E ee X X X 

 X X X X X X X X 

2

1

ˆ ˆ( ) & ( )

( )

 b b  

is the (i,j) component in

  j jj j jj

  jj

Var c SE c

c X X 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 88/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 89/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 90/252

2 2

2 2

2 1

ˆ ˆ ˆ ˆ( ) ( )

ˆ ˆ( )

ˆ ˆ ˆ ˆ[ ( )][ ( )] [( )( ) ]

ˆ( ) ( )

Var Y E Y E Y E Y X  

 E X X E X 

 E X X XE X 

 XCov X X X X X 

 b 

 b b b b 

 b b b b b b b b 

 b    

1ˆ( ) ( )SE Y X X X X     

2 1ˆ( ) ( )Cov X X   b    

2

2 21

1  

n

i

i

e

n k 

M lti lli it

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 91/252

Multicollinearity

• In multiple regression analysis, one is oftenconcerned with the nature and significance of the

relations between the explanatory variables and the

response variable.

• Questions that are frequently asked are:

 –  What is the relative importance of the effects of the

different independent variables?

 –  What is the magnitude of the effect of a given independentvariable on the dependent variable?

M lti lli it

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 92/252

Multicollinearity

(A) Can any independent variable be dropped from themodel because it has little or no effect on the dependentvariable?

(B) Should any independent variables not yet includedin the model be considered for possible inclusion?

Simple answers can be given to these questions if

(A) The independent variables in the model areuncorrelated among themselves.

(B) They are uncorrelated with any other independentvariables that are related to the dependent variable butomitted from the model.

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 93/252

 

M lti lli it

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 94/252

Multicollinearity• Some key problems that typically arise when the

explanatory variables being considered for theregression model are highly correlated amongthemselves are:

1. Adding or deleting an explanatory variable changes the

regression coefficients.

2. The estimated standard deviations of the regressioncoefficients become large when the explanatoryvariables in the regression model are highly

correlated with each other.3. The estimated regression coefficients individually may

not be statistically significant even though a definitestatistical relation exists between the responsevariable and the set of explanatory variables.

Why?

Why?

Problems with multicolinearity

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 95/252

Problems with multicolinearity

colinear variables can have coefficients with largestandard errors.

colinear variables can have insignificant t’s, but

very significant F’s 

• getting a larger sample doesn’t necessarily help

much

• multicolinearity is a “disease”, a violation of the

model assumptions.• Least squares and the least squares standard errors

are not OK.

Multicollinearity 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 96/252

y(strong relationship among explanatory variables themselves) 

Variances of regression coefficients are inflated (smaller).

Regression coefficients may be different from their true

values, even signs.

Adding or removing variables produces large changes in

coefficients.

Removing a data point may cause large changes in coefficient

estimates or signs.

In some cases, the F  ratio may be significant, R 2 may be

very high despite the all t ratios are insignificant(suggesting no significant relationship).

Solutions to the Multicollinearity Problem

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 97/252

Solutions to the Multicollinearity Problem

Drop a collinear variable from theregression

Combine collinear variables (e.g. use their

sum as one variable)

 

Measuring Multicollinearity

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 98/252

Measuring Multicollinearity

• The easiest way to measure the extent of multicollinearity

is simply to look at the matrix of correlations between theindividual variables. e.g.

• But another problem: if 3 or more variables are linear

- e.g. x2t  + x3t  = x4t  

•  Note that high correlation between  y and one of the  x’s  isnot muticollinearity.

Corr   x2  x3  x4

 x2 - 0.2 0.8

 x3 0.2 - 0.3

 x4 0.8 0.3 -

Multicollinearity Diagnostics

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 99/252

Multicollinearity Diagnostics

•  A formal method of detecting the presence ofmulticollinearity that is widely used is by themeans of Variance Inflation Factor. – It measures how much the variances of the estimated

regression coefficients are inflated as compared towhen the independent variables are not linearly related.

 –   is the coefficient of determination from theregression of the jth independent variable on theremaining k-1 independent variables.

k  j R

VIF  j

 j   ,2,1,1

12

 

2 j R

Multicollinearity Diagnostics

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 100/252

Multicollinearity Diagnostics

• A VIF near 1 suggests that multicollinearity is not a problemfor the independent variables. – Its estimated coefficient and associated t value will not change much as

the other independent variables are added or deleted from the regressionequation.

• A VIF much greater than 1 indicates the presence ofmulticollinearity. A maximum VIF value in excess of 10 isoften taken as an indication that the multicollinearity may beunduly influencing the least square estimates.

 –  the estimated coefficient attached to the variable is unstable and itsassociated t statistic may change considerably as the other independentvariables are added or deleted.

Multicollinearity Diagnostics

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 101/252

Multicollinearity Diagnostics

• The simple correlation coefficient between all pairs ofexplanatory variables (i.e., X1, X2, …, Xk  ) is helpfulin selecting appropriate explanatory variables for aregression model and is also critical for examining

multicollinearity.• While it is true that a correlation very close to +1

or – 1 does suggest multicollinearity, it is not true(unless there are only two explanatory variables)

to infer multicollinearity does not exist when thereare no high correlations between any pair ofexplanatory variables.

Example:Sales Forecasting

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 102/252

Example:Sales Forecasting

Pearson Correlation Coefficients, N = 20

Prob > |r| under H0: Rho=0

SUBSCRIB ADRATE SIGNAL APIPOP COMPETE

SUBSCRIB 1.00000 -0.02848 0.44762 0.90447 0.79832

SUBSCRIB 0.9051 0.0478 <.0001 <.0001

ADRATE -0.02848  1.00000 -0.01021 0.32512 0.34147

ADRATE 0.9051 0.9659 0.1619 0.1406

SIGNAL 0.44762 -0.01021 1.00000 0.45303 0.46895SIGNAL 0.0478 0.9659 0.0449 0.0370

APIPOP 0.90447 0.32512  0.45303 1.00000 0.87592 

APIPOP <.0001 0.1619 0.0449 <.0001

COMPETE 0.79832 0.34147  0.46895 0.87592 1.00000

COMPETE <.0001 0.1406 0.0370  <.0001

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 103/252

Example:Sales Forecasting

APIPOP495.0ADRATE25.028.96SUBSCRIBE  

COMPETE16.23APIPOP44.0SIGNAL.02-ADRATE27.042.51SUBSCRIBE  

COMPETE13.92APIPOP43.0ADRATE26.032.51SUBSCRIBE  

Example:Sales Forecasting

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 104/252

Example:Sales Forecasting

• VIF calculation: – Fit the model

COMPETE  3210 ADRATESIGNALAPIPOP   b  b  b  b 

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.878054

R Square 0.770978

 Adjusted R Square 0.728036

Standard Error 264.3027

Observations 20

 ANOVA

df SS MS F Significance F  

Regression 3 3762601 1254200 17.9541 2.25472E-05

Residual 16 1117695 69855.92

Total 19 4880295

Coefficient andard Err t Stat P-value Lower 95% Upper 95%

Intercept -472.685 139.7492 -3.38238 0.003799 -768.9402258 -176.43

Compete 159.8413 28.29157 5.649786 3.62E-05 99.86587622 219.8168

 ADRATE 0.048173 0.149395 0.322455 0.751283 -0.268529713 0.364876

Signal 0.037937 0.083011 0.457012 0.653806 -0.138038952 0.213913

Example:Sales Forecasting

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 105/252

Example:Sales Forecasting

• Fit the model

SIGNAL 3210 APIPOPADRATECompete   b  b  b  b 

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.882936

R Square 0.779575

 Adjusted R Square 0.738246

Standard Error 1.34954

Observations 20

 ANOVA

df SS MS F Significance F  

Regression 3 103.0599 34.35329 18.86239 1.66815E-05

Residual 16 29.14013 1.821258

Total 19 132.2

Coefficient andard Err t Stat P-value Lower 95% Upper 95%

Intercept 3.10416 0.520589 5.96278 1.99E-05 2.000559786 4.20776

 ADRATE 0.000491 0.000755 0.649331 0.525337 -0.001110874 0.002092

Signal 0.000334 0.000418 0.799258 0.435846 -0.000552489 0.001221

 APIPOP 0.004167 0.000738 5.649786 3.62E-05 0.002603667 0.005731

Example:Sales Forecasting

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 106/252

Example:Sales Forecasting

• Fit the modelCOMPETE  3210 APIPOPADRATESignal   b  b  b  b 

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.512244

R Square 0.262394

 Adjusted R Square 0.124092

Standard Error 790.8387

Observations 20

 ANOVA

df SS MS F Significance F  

Regression 3 3559789 1186596 1.897261 0.170774675

Residual 16 10006813 625425.8

Total 19 13566602

Coefficient andard Err t Stat P-value Lower 95% Upper 95%

Intercept 5.171093 547.6089 0.009443 0.992582 -1155.707711 1166.05

 APIPOP 0.339655 0.743207 0.457012 0.653806 -1.235874129 1.915184

Compete 114.8227 143.6617 0.799258 0.435846 -189.7263711 419.3718

 ADRATE -0.38091 0.438238 -0.86919 0.397593 -1.309935875 0.548109

Example:Sales Forecasting

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 107/252

Example:Sales Forecasting

• Fit the model

COMPETE  3210 APIPOPSignalADRATE   b  b  b  b 

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.399084R Square 0.159268

 Adjusted R Square 0.001631

Standard Error 440.8588

Observations 20

 ANOVA

df SS MS F Significance F  

Regression 3 589101.7 196367.2 1.010346 0.413876018

Residual 16 3109703 194356.5Total 19 3698805

Coefficient andard Err t Stat P-value Lower 95% Upper 95%

Intercept 253.7304 298.6063 0.849716 0.408018 -379.2865355 886.7474

Signal -0.11837 0.136186 -0.86919 0.397593 -0.407073832 0.170329

 APIPOP 0.134029 0.415653 0.322455 0.751283 -0.747116077 1.015175

Compete 52.3446 80.61309 0.649331 0.525337 -118.5474784 223.2367

Example:Sales Forecasting

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 108/252

Example:Sales Forecasting

• VIF calculation Results:

• There is no significant multicollinearity.

Variable R- Squared VIF

ADRATE   0.159268 1.19

COMPETE   0.779575 4.54

SIGNAL   0.262394 1.36

APIPOP   0.770978 4.36

 Example: Multicolinearity

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 109/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 110/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 111/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 112/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 113/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 114/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 115/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 116/252

Outliers

 Outliers 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 117/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 118/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 119/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 120/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 121/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 122/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 123/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 124/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 125/252

 

Outliers = unusual observations

H fi d l b i ?

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 126/252

How can we find unusual observations?

cause for outliers

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 127/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 128/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 129/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 130/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 131/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 132/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 133/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 134/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 135/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 136/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 137/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 138/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 139/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 140/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 141/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 142/252

 Out of sample predictions

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 143/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 144/252

 

Out of sample predictions

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 145/252

Out of sample-Extrapolation 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 146/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 147/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 148/252

 Non Linear functional forms

• Standard regression model is a linear

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 149/252

• Standard regression model is a linear

conditional mean model.• In many situations in practice, it is desirable to

have some flexibility to specify non-linear

regression functions.• The standard linear regression model can be

"tricked" into displaying non-linearity by two

techniques:

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 150/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 151/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 152/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 153/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 154/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 155/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 156/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 157/252

1 1log( ) log( )

Y X  

Y X  Y X   b b 

1

log( )log( )

Y  

Y  Y  X X  

X  

 b 

Example: Brain Weight Data

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 158/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 159/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 160/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 161/252

 

Ln(64)=4.1589

龍貓 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 162/252

 

6.4g<= 64g

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 163/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 164/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 165/252

Dummy Variables

Categorical Explanatory Variablesin Regression Models

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 166/252

166

in Regression Models

• Categorical independent variables can

be incorporated into a regression model

by converting them into 0/1 (“dummy”)

variables

• For binary variables, code dummies “0”

for “no” and 1 for “yes” 

Dummy Variables, More than two levels

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 167/252

167

For categorical variables with k categories, use

k  –1 dummy variables• SMOKE2 has three levels, initially coded

0 = non-smoker , 1 = former smoker, 2 = current smoker

Use k   – 1 = 3 – 1 = 2 dummy variables to code

this information like this:

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 168/252

Example, cont. • Regress FEV on SMOKE least squares

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 169/252

169

Regress FEV on SMOKE least squares

regression line:ŷ = 2.566 + 0.711X 

• Intercept (2.566) = the mean FEV of group 0

• Slope = the mean difference in FEV= 3.277 − 2.566 = 0.711 

• t stat = 6.464 with 652 df , P  ≈ 0.000 (same asequal variance t test)

• The 95% CI for slope β is 0.495 to 0.927 (sameas the 95% CI for μ1 − μ0)

Dummy Variable SMOKE

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 170/252

Basic Biostat 170

Regression linepasses throughgroup means

b  = 3.277  – 2.566 = 0.711

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 171/252

Multiple RegressionCoefficients

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 172/252

172

CoefficientsRely onsoftware to

calculate

multipleregression

statistics

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 173/252

Multiple Regression Coefficients, cont. 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 174/252

174

• The slope coefficient associated for SMOKE is−0.206, suggesting that smokers have 0.206

less FEV on average compared to non-smokers

(after adjusting for age)

• The slope coefficient for AGE is 0.231,suggesting that each year of age in associated

with an increase of 0.231 FEV units on average

(after adjusting for SMOKE)

Inference About the Coefficients

I f ti l t ti ti l l t d f h

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 175/252

175

Coefficientsa

.367 .081 4.511 .000

-.209 .081 -.072 -2.588 .010

.231 .008 .786 28.176 .000

(Constant)

smoke

age

Model

1

B Std. Error  

Unstandardized

Coeff icients

Beta

Standardized

Coeff icients

t Sig.

Dependent Variable: feva.

Inferential statistics are calculated for each

regression coefficient. For example, in testing

H 0: β1 = 0 (SMOKE coefficient controlling for AGE)

t stat = −2.588 and P = 0.010

df = n  – k   – 1 = 654 – 2 – 1 = 651

Inference About the Coefficients

Th 95% fid i t l f thi l f

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 176/252

176

The 95% confidence interval for this slope of

SMOKE controlling for AGE is −0.368 to − 0.050. 

Coefficientsa

.207 .527

-.368 -.050

.215 .247

(Constant)

smoke

age

Model

1

Lower Bound Upper Bound95% Confidence Interval for B

Dependent Variable: f eva.

Comparing the Slopes of Two or More Regression Lines

Suppose we have a quantitative explanatory variable, X1, and we have two

possible regression lines: one for situation 1 (location A), the other for situation

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 177/252

15-177

Use of dummy (or classification) variables in regression.

y

x1 

A)(loc110   x y   b  b  

B)(loc132   x y   b  b     310 :   b  b   H 

200 :   b  b    H 

Equal intercepts

Equal slopes

possible regression lines: one for situation 1 (location A), the other for situation

2 (location B).

Reformulate the model.

Define a new variable, x2 such that

x = 0 for situation 1 (Location A)

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 178/252

178

x2 = 0 for situation 1 (Location A)

x2 = 1 for situation 2 (Location B)Then use multiple regression.

21322110ˆ   x x x x y         

When x2=0 we have: 110110ˆ   x x y   b  b     

When x2=1 we have:13213120 )()(ˆ   x x y   b  b       

Test of 2=0 equivalent to no intercept difference.Test of 3=0 equivalent to no slope difference.

Tests are based on reduction (drop) sums of squares,

as previously defined. 

Dummy variable

y y20 and 30 20 and 30

y=0+ 1 x y=0+ 1 x

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 179/252

179

x1

  x1

 y

x1 

y

x1 

20 and 30 20 and 30

y=(0+2)+ (1+3)

x

y=(0+2)+ 1 x

y=0+ 1 x

y=0+ (1+3) x

y=0+ 1 x

y=0+ 1 x

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 180/252

Example

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 181/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 182/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 183/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 184/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 185/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 186/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 187/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 188/252

Example

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 189/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 190/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 191/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 192/252

Autocorrelation 

in the errors

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 193/252

We assumed the errors are independent, that is,Cov (ui , u j) = 0 for I   j 

• This is essentially the same as saying there is no pattern.

• Obviously we never have the actual u’s, so we use theirsample counterpart, the residuals.

• If there are patterns in the residuals from a model, we saythat they are autocorrelated.

• Some stereotypical patterns we may find in the residuals.

Positive Autocorrelation 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 194/252

Positive Autocorrelation is indicated by a cyclical residual plot over time.

+

-

  -

t ˆ

+

+

-

Time

t u

Negative Autocorrelation 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 195/252

 Negative autocorrelation is indicated by an alternating pattern where the residuals

cross the time axis more frequently than if they were distributed randomly

+

-

  -

t u

+

t u

+

-

t u

Time

No pattern in residuals –  

No autocorrelation 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 196/252

 No pattern in residuals at all: this is what we would like to see

+

t u

-

  -

+

1ˆ t 

+

t uˆ

Detecting Autocorrelation:

The Durbin-Watson Test 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 197/252

The Durbin-Watson (DW) is a test for first orderautocorrelation - i.e. it assumes that the relationship is between

an error and the previous one

u t  = u t- 1 + v t  

where v t   N (0, v 2).• The DW test statistic actually tests

H0 :    = 0 and H1 :      0

• The test statistic is calculated by  DW 

u u

u

t t t 

t t 

 

1 22

2

2

The Durbin-Watson Test: Critical Values

We can also write  DW  2 1(   )  

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 198/252

where is the estimated correlation coefficient.Since is a correlation, it implies that 0  DW 4.

• If = 0,  DW   = 2. So roughly speaking, do not

reject the null hypothesis if DW  is near 2, i.e. thereis little evidence of autocorrelation.

•  DW   has 2 critical values, an upper critical value

(d u) and a lower critical value (d  L), and there is

also an intermediate region where we can neitherreject nor not reject H0.

( )

  

  

  

The Durbin-Watson Test: Interpreting the Results

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 199/252

Conditions which Must be Fulfilled for DW to be a Valid Test

1. Constant term in regression

2. Regressors are non-stochastic

3. No lags of dependent variable

 

Another Test for Autocorrelation:

The Breusch-Godfrey Test

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 200/252

• It is a more general test for r th order autocorrelation:

 N(0, )

• The null and alternative hypotheses are:

H0 :   1 = 0 and   2 = 0 and ... and   r  = 0

H1 :  1  0 or   2  0 or ... or   r   0• The test is carried out as follows:

1). Estimate the linear regression using OLS and obtain the residuals,

2). Regress on all of the regressors from stage 1 (the x’s) plus

Obtain R2 from this regression.

3). It can be shown that (T-r )R 2  2(r )

• If the test statistic exceeds the critical value from the statistical tables,

reject the null hypothesis of no autocorrelation.

ut 

ut 

u u u u u v vt t t t r t r t t      1 1 2 2 3 3 ... ,

,   ,...,   u u ut t t r   1 2

2

 

Consequences of Ignoring Autocorrelation

if it is Present

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 201/252

• The coefficient estimates derived using OLS are stillunbiased, but they are inefficient, i.e. they are not BLUE,

even in large sample sizes.

• Thus, if the standard error estimates are inappropriate,

there exists the possibility that we could make the wronginferences.

•  R2 is likely to be inflated relative to its “correct” value for

 positively correlated residuals.

 

“Remedies” for Autocorrelation 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 202/252

• If the form of the autocorrelation is known, we could use a GLS procedure  –   i.e. an approach that allows for autocorrelated residualse.g., Cochrane-Orcutt.

• But such procedures that “correct” for autocorrelation requireassumptions about the form of the autocorrelation.

• If these assumptions are invalid, the cure would be more dangerousthan the disease! - see Hendry and Mizon (1978).

• However, it is unlikely to be the case that the form of theautocorrelation is known, and a more “modern” view is that residualautocorrelation presents an opportunity to modify the regression.

 

Dynamic Models 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 203/252

• All of the models we have considered so far have been static, e.g.

 yt  = b 1 + b 2 x2t  + ... +  b k  xkt  + ut  

• But we can easily extend this analysis to the case where the current

value of yt 

 depends on previous values of y or one of the x’s, e.g.

 yt  = b 1 + b 2 x2t  + ... +  b k  xkt  +  1 yt -1 +  2 x2t -1 + … +  k  xkt -1+ ut  

• We could extend the model even further by adding extra lags, e.g.

 x2t-2 ,  yt-3 .

 

Why Might we Want/Need To Include Lags

in a Regression?

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 204/252

• Inertia of the dependent variable

• Over-reactions

• Measuring time series as overlapping moving averages

• However, other problems with the regression could cause the nullhypothesis of no autocorrelation to be rejected:

 –  Omission of relevant variables, which are themselves autocorrelated.

 –  If we have committed a “misspecification”  error by using an

inappropriate functional form.

 –  Autocorrelation resulting from unparameterised seasonality.

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 205/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 206/252

The Long Run Static Equilibrium Solution:

An Example 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 207/252

If our model is

 yt  = b 1 + b 2 x2t  + b 3 x2t -1 + b 4 yt -1 + ut  

then the static solution would be given by0 = b 1 +  b 3 x2t -1 + b 4 yt -1

b4 yt -1 = -  b 1 -  b 3 x2t -1

2

4

3

4

1  x y b  b 

 b  b 

 

Problems with Adding Lagged Regressors

to “Cure” Autocorrelation 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 208/252

• Inclusion of lagged values of the dependent variable violates the

assumption that the RHS variables are non-stochastic.

• What does an equation with a large number of lags actually mean?

•  Note that if there is still autocorrelation in the residuals of a model

including lags, then the OLS estimators will not even be consistent.

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 209/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 210/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 211/252

 

Parameter Stability Tests

 

Parameter Stability Tests

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 212/252

• So far, we have estimated regressions such as

• We have implicitly assumed that the parameters ( b 1,  b 2  and  b 3) areconstant for the entire sample period.

• We can test this implicit assumption using parameter stability tests. Theidea is essentially to split the data into sub-periods and then to estimate upto three models, for each of the sub-parts and for all the data and then to“compare” the RSS  of the models.

• There are two types of test we can look at:- Chow test (analysis of variance test)

- Predictive failure tests

t  = b 1 + b 2 x2t  + b 3 x3t  + ut  

The Chow Test 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 213/252

• The steps involved are:

1. Split the data into two sub-periods. Estimate the regression over the

whole period and then for the two sub-periods separately (3 regressions).

Obtain the RSS for each regression.

2. The restricted regression is now the regression for the whole period

while the “unrestricted regression” comes in two parts: for each of the sub-

samples.

We can thus form an F-test which is the difference between the RSS ’s.

The statistic is

 RSS RSS RSS 

 RSS RSS T k 

    1 2

1 2

2

The Chow Test (cont’d) 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 214/252

where:

 RSS  = RSS for whole sample

 RSS 1 = RSS for sub-sample 1

 RSS 2 = RSS for sub-sample 2

T  = number of observations2k  = number of regressors in the “unrestricted” regression (since it comes

in two parts)

k  = number of regressors in (each part of the) “unrestricted” regression 

3. Perform the test. If the value of the test statistic is greater than the

critical value from the F-distribution, which is an F(k , T-2k ), then reject

the null hypothesis that the parameters are stable over time.

 

A Chow Test Example

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 215/252

• Consider the following regression for the CAPM b  (again) for the

returns on Glaxo.

• Say that we are interested in estimating Beta for monthly data from

1981-1992. The model for each sub-period is

• 1981M1 - 1987M10

0.24 + 1.2 R Mt   T  = 82  RSS 1 = 0.03555

• 1987M11 - 1992M12

0.68 + 1.53 R Mt   T  = 62  RSS 2 = 0.00336

• 1981M1 - 1992M12

0.39 + 1.37 R Mt   T  = 144  RSS   = 0.0434

A Chow Test Example - Results

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 216/252

• The null hypothesis is

• The unrestricted model is the model where this restriction is not imposed

= 7.698

Compare with 5% F(2,140) = 3.06

• We reject H0 at the 5% level and say that we reject the restriction that thecoefficients are the same in the two periods.

 H and 0 1 2 1 2: b b  

Test statistic       00434 00355 000336

0 0355 0 00336

144 4

2

. . .

. .

 

The Predictive Failure Test

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 217/252

• Problem with the Chow test is that we need to have enough data to do theregression on both sub-samples, i.e. T 1>>k , T 2>>k .

• An alternative formulation is the predictive failure test.

• What we do with the predictive failure test is estimate the regression over a “long” sub-period (i.e. most of the data) and then we predict values for the other periodand compare the two.

To calculate the test:

- Run the regression for the whole period (the restricted regression) and obtain the RSS

- Run the regression for the “large” sub-period and obtain the RSS (called RSS 1). Note

we call the number of observations T 1 (even though it may come second).

where T 2 = number of observations we are attempting to “predict”. The test statistic

will follow an F(T 2, T 1-k ).

2

1

1

1StatisticTestT 

k T 

 RSS 

 RSS  RSS   

Backwards versus Forwards Predictive Failure Tests

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 218/252

• There are 2 types of predictive failure tests:

- Forward predictive failure tests, where we keep the last few

observations back for forecast testing, e.g. we have observations for

1970Q1-1994Q4. So estimate the model over 1970Q1-1993Q4 andforecast 1994Q1-1994Q4.

- Backward predictive failure tests, where we attempt to “back -cast” 

the first few observations, e.g. if we have data for 1970Q1-1994Q4,

and we estimate the model over 1971Q1-1994Q4 and backcast1970Q1-1970Q4.

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 219/252

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 220/252

 

Omission of an Important Variable or

Inclusion of an Irrelevant Variable 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 221/252

Omission of an Important Variable 

• Consequence: The estimated coefficients on all the other variables will be biased and inconsistent unless the excluded variable is uncorrelated withall the included variables.

• Even if this condition is satisfied, the estimate of the coefficient on theconstant term will be biased.

• The standard errors will also be biased.

Inclusion of an Irrelevant Variable

• Coefficient estimates will still be consistent and unbiased, but theestimators will be inefficient.

 

How do we decide the sub-parts to use?

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 222/252

• As a rule of thumb, we could use all or some of the following:- Plot the dependent variable over time and split the data accordingly to any

obvious structural changes in the

series, e.g.

- Split the data according to any known

important historical events (e.g. stock market crash, new government elected)

- Use all but the last few observations and do a predictive failure test on those.

0

200

400

600

800

1000

1200

1400

       1        2       7        5       3        7       9

       1       0       5

       1       3       1

       1       5       7

       1       8       3

       2       0       9

       2       3       5

       2       6       1

       2       8       7

       3       1       3

       3       3       9

       3       6       5

       3       9       1

       4       1       7

       4       4       3

Sample Period

   V  a   l  u  e  o   f   S  e  r   i  e  s   (  y   t   )

 

A Strategy for Building Econometric Models

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 223/252

Our Objective:

• To build a statistically adequate empirical model which

- satisfies the assumptions of the CLRM

- is parsimonious

- has the appropriate theoretical interpretation

- has the right “shape” - i.e.

- all signs on coefficients are “correct” 

- all sizes of coefficients are “correct” 

- is capable of explaining the results of all competing models

 

2 Approaches to Building Econometric Models

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 224/252

• There are 2 popular philosophies of building econometric models: the“specific-to-general” and “general-to-specific” approaches.

• “Specific-to-general”  was used almost universally until the mid 1980’s, and involved starting with the simplest model and gradually adding to it.

• Little, if any, diagnostic testing was undertaken. But this meant that allinferences were potentially invalid.

• An alternative and more modern approach to model building is the “LSE” 

or Hendry “general-to-specific” methodology.

• The advantages of this approach are that it is statistically sensible and alsothe theory on which the models are based usually has nothing to say aboutthe lag structure of a model.

The General-to-Specific Approach

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 225/252

• First step is to form a “large” model with lots of variables on the right handside

• This is known as a GUM (generalised unrestricted model)

• At this stage, we want to make sure that the model satisfies all of the

assumptions of the CLRM

• If the assumptions are violated, we need to take appropriate actions to remedythis, e.g.

- taking logs

- adding lags

- dummy variables

• We need to do this before testing hypotheses

• Once we have a model which satisfies the assumptions, it could be very big

with lots of lags & independent variables

 

The General-to-Specific Approach:

Reparameterising the Model

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 226/252

• The next stage is to reparameterise the model by

- knocking out very insignificant regressors

- some coefficients may be insignificantly different from each other,

so we can combine them.

• At each stage, we need to check the assumptions are still OK.

• Hopefully at this stage, we have a statistically adequate empirical model

which we can use for

- testing underlying financial theories

- forecasting future values of the dependent variable

- formulating policies, etc. 

 

Regression Analysis In Practice - A Further Example:

Determinants of Sovereign Credit Ratings 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 227/252

• Cantor and Packer (1996)

Financial background:

• What are sovereign credit ratings and why are we interested in them?

• Two ratings agencies (Moody’s  and Standard and Poor’s)  provide creditratings for many governments.

• Each possible rating is denoted by a grading:

Moody’s  Standard and Poor’s 

Aaa AAA……  …..

B3 B-

 

Purposes 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 228/252

- to attempt to explain and model how the ratings agencies arrived at

their ratings.

- to use the same factors to explain the spreads of sovereign yields

above a risk-free proxy

- to determine what factors affect how the sovereign yields react to

ratings announcements

 

Determinants of Sovereign Ratings

• Data

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 229/252

Data

Quantifying the ratings (dependent variable): Aaa/AAA=16, ... , B3/B-=1

• Explanatory variables (units of measurement):

- Per capita income in 1994 (thousands of dollars)

- Average annual GDP growth 1991-1994 (%)

- Average annual inflation 1992-1994 (%)

- Fiscal balance: Average annual government budget surplus as a

 proportion of GDP 1992-1994 (%)

- External balance: Average annual current account surplus as a proportion

of GDP 1992-1994 (%)

- External debt Foreign currency debt as a proportion of exports 1994 (%)

- Dummy for economic development

- Dummy for default history

Income and inflation are transformed to their logarithms.

 

The model: Linear and estimated using OLS

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 230/252

Dependent Variable

Explanatory VariableExpected

signAverageRating

Moody’sRating

S&PRating

Moody’s / S&PDifference

Intercept ? 1.442(0.663)

3.408(1.379)

-0.524(-0.223)

3.932**(2.521)

Per capita income + 1.242***(5.302)

1.027***(4.041)

1.458***(6.048)

-0.431***(-2.688)

GDP growth + 0.151(1.935)

0.130(1.545)

0.171**(2.132)

-0.040(0.756)

Inflation - -0.611***(-2.839)

-0.630***(-2.701)

-0.591***(2.671)

-0.039(-0.265)

Fiscal Balance + 0.073(1.324)

0.049(0.818)

0.097*(1.71)

-0.048(-1.274)

External Balance + 0.003(0.314)

0.006(0.535)

0.001(0.046)

0.006(0.779)

External Debt - -0.013***(-5.088)

-0.015***(-5.365)

-0.011***(-4.236)

-0.004***(-2.133)

Development dummy + 2.776***

(4.25)

2.957***

(4.175)

2.595***

(3.861)

0.362

(0.81)Default dummy - -2.042***

(-3.175)-1.63**(-2.097)

-2.622***(-3.962)

1.159***(2.632)

Adjusted R2 0.924 0.905 0.926 0.836 Notes: t -ratios in parentheses; *, **, and *** indicate significance at the 10%, 5% and 1% levelsrespectively. Source: Cantor and Packer (1996). Reprinted with permission from  Institutional Investor .

Interpreting the Model

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 231/252

 From a statistical perspective • Virtually no diagnostics

• Adjusted R2 is high

• Look at the residuals: actual rating - fitted rating

 From a financial perspective 

• Do the coefficients have their expected signs and sizes?

 Do Ratings Add to Publicly Available Available Information?•  Now dependent variable is

- Log (Yield on the sovereign bond - yield on a US treasury bond)

Do Ratings Add to Publicly Available Available

Information? Results

Dependent Variable: Log (yield spread)

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 232/252

p g (y p )

Variable Expected Sign (1) (2) (3)Intercept ? 2.105***

(16.148)

0.466

(0.345)

0.074

(0.071)

Average

Rating

- -0.221***

(-19.175)

-0.218***

(-4.276)Per capita

income- -0.144

(-0.927)

0.226

(1.523)GDP growth - -0.004

(-0.142)

0.029

(1.227)Inflation + 0.108

(1.393)

-0.004

(-0.068)Fiscal Balance - -0.037

(-1.557)

-0.02

(-1.045)ExternalBalance

- -0.038

(-1.29)

-0.023

(-1.008)External Debt + 0.003***

(2.651)

0.000

(0.095)Development

dummy- -0.723***

(-2.059)

-0.38

(-1.341)Default dummy + 0.612***

(2.577)

0.085

(0.385)Adjusted R

20.919 0.857 0.914

 Notes: t-ratios in parentheses; *, **, and *** indicate significance at the 10%, 5% and 1% levels

respectively. Source: Cantor and Packer (1996). Reprinted with permission from Institutional Investor .

What Determines How the Market Reacts

to Ratings Announcements? 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 233/252

• The sample: Every announcement of a ratings change that occurred

 between 1987 and 1994 - 79 such announcements spread over 18

countries.

• 39 were actual ratings changes

• 40 were “watchlist / outlook” changes

• The dependent variable: changes in the relative spreads over the US

T-bond over a 2-day period at the time of the announcement.

What Determines How the Market Reacts

to Ratings Announcements? Explanatory variables.

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 234/252

0 /1 dummies for

- Whether the announcement was positive

- Whether there was an actual ratings change

- Whether the bond was speculative grade

- Whether there had been another ratings announcement in the previous 60 days.

and

- The change in the spread over the previous 60 days.

- The ratings gap between the announcing and the other agency

What Determines How the Market Reacts

to Ratings Announcements? Results

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 235/252

Dependent Variable: Log Relative SpreadIndependent variable Coefficient (t -ratio)

Intercept -0.02(-1.4)

Positive announcements 0.01(0.34)

Ratings changes -0.01(-0.37)

Moody’s announcements  0.02(1.51)

Speculative grade 0.03**

(2.33)Change in relative spreads from day – 60 to day -1 -0.06

(-1.1)

Rating gap 0.03*(1.7)

Other rating announcements from day – 60 to day -1 0.05**(2.15)

Adjusted R2  0.12

 Note: * and ** denote significance at the 10% and 5% levels respectively. Source: Cantor and Packer

(1996). Reprinted with permission from Institutional Investor .

Conclusions

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 236/252

• 6 factors appear to play a big role in determining sovereign credit

ratings - incomes, GDP growth, inflation, external debt, industrialised

or not, and default history.

• The ratings provide more information on yields than all of the macro

factors put together.

• We cannot determine well what factors influence how the markets will

react to ratings announcements.

Comments on the Paper

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 237/252

• Only 49 observations for first set of regressions and 35 for yield

regressions and up to 10 regressors

•  No attempt at reparameterisation

• Little attempt at diagnostic checking

• Where did the factors (explanatory variables) come from?

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 238/252

 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 239/252

 

Example:• Simple Regression Results

Coefficients tandard Erro t Stat   F-Value 9.89

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 240/252

• Multiple Regression Results

• Check the size and significance level of the

coefficients, the F-value, the R-Square, etc. Youwill see what the “net of “ effects are.

Intercept (b0) 165.0333581 16.50316094 10.000106

Lotsize (b1)   6.931792143 2.203156234 3.1463008

Adjusted R Square 0.108

Standard Error 36.34

Coefficients Standard Error t Stat  Intercept 59.32299284 20.20765695 2.935669

Lotsize 3.580936283 1.794731507 1.995249

Rooms 18.25064446 2.681400117 6.806386

F-Value 31.23Adjusted R Square 0.453

Standard Error 28.47

Using The Equation to Make Predictions

• Predict the appraised value at average lot size

(7.24) and average number of rooms (7.12).

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 241/252

• What is the total effect from 2000 sf increase in

lot size and 2 additional rooms?

$215,180or 215.18

 )18.25(7.12(7.24)3.5859.32. App.Val 

 in app. value

(3.58)(2000) (18.25)(2)

$43,660

Increse 

Coefficient of Multiple Determination, r2 and

Adjusted r2

• Reports the proportion of total variation in Y

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 242/252

explained by all X variables taken together (the

model)

• Adjusted r2

• r2  never decreases when a new X variable is added to

the model

 – This can be a disadvantage when comparing models

squaresof sumtotal

squaresof sumregression

SST

SSRr 2 k..12.Y  

• What is the net effect of adding a new variable? – We lose a degree of freedom when a new X variable is added

 – Did the new X variable add enough explanatory power to offset

the loss of one degree of freedom?

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 243/252

• Shows the proportion of variation in Y explainedby all X variables adjusted for the number of X 

variables used

(where n = sample size, k = number of independent variables)

 – Penalize excessive use of unimportant independent

variables

 – Smaller than r2

 – Useful in comparing among models

 

 

 

 

1k n

1n

 )r 1( 1r 

2

k ..12.Y 

2

adj

Multiple Regression Assumptions• Assumptions:

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 244/252

• The errors are normally distributed• Errors have a constant variance

• The model errors are independent

• Errors (residuals) from the regression model:

ei = (Yi  – Yi)

• These residual plots are used in multiple regression:

 – Residuals vs. Yi

 – Residuals vs. X1i 

 – Residuals vs. X2i 

 – Residuals vs. time (if time series data) 

Two variable model

Y

bbbˆ

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 245/252

X1

X2

22110   X b X bbY  Yi

Yi<  

x2i

x1i

The best fit equation, Y ,

is found by minimizing the

sum of squared errors, e2

<  

Sample

observation

Residual = ei  =

(Yi  – Yi)<  

Are Individual Variables Significant?

• Use t-tests of individual variable slopes

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 246/252

• Shows if there is a linear relationship between the

variable Xi and Y; Hypotheses:

• H0: βi = 0 (no linear relationship)

• H1: βi ≠ 0 (linear relationship does exist between Xi and Y)

• Test Statistic:

• Confidence interval for the population slope βi 

ib

i1k n

0bt 

 

ib1k ni  S t b

Is the Overall Model Significant?

• F-Test for Overall Significance of the Model

h f h l l h b ll f h

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 247/252

• Shows if there is a linear relationship between all of the X variables considered together and Y 

• Use F test statistic; Hypotheses:

H0: β

1 = β

2 = … = β

k = 0 (no linear relationship)

H1: at least one βi  ≠ 0 (at least one independentvariable affects Y)

• Test statistic:

1k n

SSE 

SSR

 MSE 

 MSR F 

Testing Portions of the Multiple RegressionModel

f d f l f d d l j

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 248/252

• To find out if inclusion of an individual X j or aset of Xs, significantly improves the model,

given that other independent variables are

included in the model• Two Measures:

1. Partial F-test criterion

2. The Coefficient of Partial Determination

Contribution of a Single Independent

Variable X j 

SSR(Xj | ll i bl Xj)

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 249/252

SSR(X j | all variables except X j)= SSR (all variables) – SSR(all variables except X j)

• Measures the contribution of X j  in explaining the total

variation in Y (SST)

• consider here a 3-variable model:SSR(X1 | X2 and X3)

= SSR (all variablesX1-x3) – SSR(X2 and X3)

SSRUR 

Model

SSRR

Model

The Partial F-Test Statistic• Consider the hypothesis test:

H0: variable Xj does not significantly improve the model after all

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 250/252

0other variables are included

H1: variable Xj significantly improves the model after all other

variables are included

1)-k - /(nSSE  MSE 

 n)restrictioof number  )/(df SSR-(SSR F 

UR

 RUR

Note that the numerator is the contribution of X j to the regression.

If Actual F Statistic is > than the Critical F, then

Conclusion is: Reject H0; adding X1 does improve model

Coefficient of Partial Determination for one or aset of variables

• Measures the proportion of total variation in the dependent

i bl (SST) th t i l i d b X hil t lli f

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 251/252

variable (SST) that is explained by X j while controlling for(holding constant) the other explanatory variables

 RUR

 RUR2

 j)except variablesYj.(all 

SSRSST  SSR-SSRr 

8/13/2019 00000chen- Linear Regression Analysis3

http://slidepdf.com/reader/full/00000chen-linear-regression-analysis3 252/252

Too complicated

by hand!

Why & How it works?