residual analysis purposes –examine functional form (linear vs. non- linear model) –evaluate...

Post on 06-Jan-2018

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Purposes –Examine Functional Form (Linear vs. Non- Linear Model) –Evaluate Violations of Assumptions Graphical Analysis of Residuals Residual Analysis

TRANSCRIPT

Residual AnalysisResidual Analysis

• Purposes– Examine Functional Form (Linear vs. Non-

Linear Model)– Evaluate Violations of Assumptions

• Graphical Analysis of Residuals

Residual AnalysisResidual Analysis

(X1, Y1)

For one value X1, a population contains may Y values. Their mean is Y1.

X1

Y

Y

X

A Population Regression Line

Y = X

Y

x

A Sample Regression Line

The sample line approximates the population regression line.

y = a + bx

Histogram of Y Values at X = X1

Y

f(e)

XX1

Y = XY1 = X1

Normal Distribution of Y Values when X = X1

Y

f(e)

XX1

Y1 = X1 Y = X

The standard deviation of the normal distribution is the standard error of estimate.

Normality & Constant Variance Assumptions

Y

f(e)

X

X1X2

A Normal Regression Surface

Y

f(e)

X

X1X2

Every cross-sectional slice of the surface is a normal curve.

Analysis of Residuals

A residual is the difference between the actual value of Y and the

predicted value .Y

Linear Regression and Correlation Assumptions

• The independent variables and the dependent variable have a linear relationship.

• The dependent variable must be continuous and at least interval-scale.

Linear Regression Assumptions • Normality

Y Values Are Normally Distributed with a mean of Zero For Each X. heresiduals ( )are normally distributed with a mean of Zero.

Homoscedasticity (Constant Variance) The variation in the residuals must be the same for all values of Y. The standard deviation of the residuals is the same regardless of the given

value of X.

Independence of Errors The residuals are independent for each value of X The residuals ( ) are independent of each other The size of the error for a particular value of x is not related to the size of

the error for any other value of x

Evaluating the Aptness of the Fitted Regression Model

Does the model appear linear?

Residual Plot for Linearity(Functional Form)

Aptness of the Fitted Model

Correct Specification

X

e

Add X2 Term

X

e

Residual Plots for LinearityResidual Plots for Linearityof the Fitted Modelof the Fitted Model

• Scatter Plot of Y vs. X value• Scatter Plot of residuals vs. X value

Using SPSS to Test for Linearity of the Regression Model

• Analyze/Regression/Linear– Dependent - Sales– Independent - Customers– Save

• Predicted Value (Unstandardized or Standardized)• Residual (Unstandardizedor Standardized)

• Graphs/Scatter/Simple• Y-Axis: residual [ res_1 or zre_1 ]• X-Axis: Customer (independent variable)

Sales and Customers Problem

.85945

.54359-.00009.31852.10951

-.10343-.60249-.14501.19914.03063

-.72027-.60503

-1.02895-.08175.55129.16327

-.02414.43939.53032

12345678910111213141516171819

Unstandardized Residual

Sales and Customers Problem a

907 11 10.34055 .85945926 11 10.50641 .54359506 7 6.84009 -.00009741 9 8.89148 .31852789 9 9.31049 .10951889 10 10.18343 -.10343874 9 10.05249 -.60249510 7 6.87501 -.14501529 7 7.04086 .19914420 6 6.08937 .03063679 8 8.35027 -.72027872 9 10.03503 -.60503924 9 10.48895 -1.02895607 8 7.72175 -.08175452 7 6.36871 .55129729 9 8.78673 .16327794 9 9.35414 -.02414844 10 9.79061 .43939

1010 12 11.23968 .53032

12345678910111213141516171819

CUSTOMER SALES

Unstandardized Predicted

ValueUnstandardized Residual

Scatter Plot of Customer by Sales

CUSTOMER

11001000900800700600500400

SA

LES

12

11

10

9

8

7

6

Scatter Plot of Customer by Residuals

CUSTOMER

11001000900800700600500400

Uns

tand

ardi

zed

Res

idua

l1.0

.5

0.0

-.5

-1.0

-1.5

Plot of Residuals vs R&D ExpendituresPlot of Residuals vs X Values

RDEXPEND

1614121086420

Res

idua

l

60

40

20

0

-20

-40

ELECTRONIC FIRMS

TheLinear Regression Assumptions

1. Normality of residuals (Errors)2. Homoscedasticity (Constant Variance)3. Independence of Residuals (Errors)

Need to verify using residual analysis.

Residual Plots for NormalityResidual Plots for Normality• Construct histogram of residuals

– Stem-and-leaf plot– Box-and-whisker plot– Normal probability plot

• Scatter Plot residuals vs. X values– Simple regression

• Scatter Plot residuals vs. Y– Multiple regression

Residual Plot 1 for Residual Plot 1 for NormalityNormalityConstruct histogram of residuals

• Nearly symmetric• Centered near or at zero• Shape is approximately normal

RESIDUAL

3.02.01.00.0-1.0-2.0-3.0

10

8

6

4

2

0

Std. Dev = 1.61 Mean = 0.0N = 31.00

Using SPSS to Test for NormalityHistogram of Residuals

• Analyze/Regression/Linear– Dependent - Sales– Independent - Customers– Plot/Standardized Residual Plot: Histogram– Save

• Predicted Value (Unstandardized or Standardized)• Residual (Unstandardizedor Standardized)

• Graphs/Histogram– Variable - residual (Unstandardized or Standardedized)

Regression Standardized Residual

1.501.00.500.00-.50-1.00-1.50-2.00

Histogram

Dependent Variable: SALESFr

eque

ncy

7

6

5

4

3

2

1

0

Std. Dev = .97

Mean = 0.00

N = 20.00

Histogram of Residuals of Sales and Customer Problemfrom regression output

Unstandardized Residual

.75.50.250.00-.25-.50-.75-1.00

7

6

5

4

3

2

1

0

Std. Dev = .49

Mean = 0.00

N = 20.00

Histogram of Residuals of Sales and Customer Problemfrom graph output

Residual Plot 2 for Residual Plot 2 for NormalityNormalityPlot residuals vs. X values

• Points should be distributed about the horizontal line at 0

• Otherwise, normality is violated

X

Residuals

0

Using SPSS to Test for NormalityScatter Plot

• Simple Regression– Graph/Scatter/Simple

• Y-Axis: residual [ res_1 or zre_1 ]• X-Axis: Customers [independent variable ]

• Multiple Regression– Graph/Scatter/Simple

• Y-Axis: residual [ res_1 or zre_1 ]• X-Axis: predicted Y values

Scatter Plot of Customer by Residuals

CUSTOMER

11001000900800700600500400

Uns

tand

ardi

zed

Res

idua

l1.0

.5

0.0

-.5

-1.0

-1.5

An accounting standards board investigating the treatment of research and development expenses by the nation’s major electronic firms was interested in the relationship between a firm’s research and development expenditures and its earnings.

The Electronic FirmsThe Electronic Firms

Earnings = 6.840 + 10.671(rdexpend)

ELECTRONIC FIRMS

RDEXPEND EARNINGS PRE_1 RES_1 ZPR_1 ZRE_1

15.00 221.00 166.90075 54.09925 1.84527 2.39432 8.50 83.00 97.54224 -14.54224 .48229 -.64361 12.00 147.00 134.88913 12.11087 1.21620 .53600 6.50 69.00 76.20116 -7.20116 .06291 -.31871 4.50 41.00 54.86008 -13.86008 -.35647 -.61342 2.00 26.00 28.18373 -2.18373 -.88070 -.09665 .50 35.00 12.17792 22.82208 -1.19523 1.01006 1.50 40.00 22.84846 17.15154 -.98554 .75909 14.00 125.00 156.23021 -31.23021 1.63558 -1.38218 9.00 97.00 102.87751 -5.87751 .58713 -.26013 7.50 53.00 86.87170 -33.87170 .27260 -1.49909 .50 12.00 12.17792 -.17792 -1.19523 -.00787 2.50 34.00 33.51900 .48100 -.77585 .02129 3.00 48.00 38.85427 9.14573 -.67101 .40477 6.00 64.00 70.86589 -6.86589 -.04194 -.30387

List of Data, Predicted Values and Residuals

Data Predicted Residual Standardized Standardized Value Predicted Value Residual

Std. Dev = .96 Mean = 0.00N = 15.00

Regression Standardized Residual

2.502.00

1.501.00

.500.00

-.50-1.00

-1.50

HistogramDependent Variable: EARNINGS

Freq

uenc

y

6543210

ELECTRONIC FIRMS

Plot of St. Residuals vs RDexpendPlot of Standardized Residuals vs X Value

RDEXPEND

1614121086420

Stan

dard

ized

Res

idua

l

3

2

1

0

-1-2

ELECTRONIC FIRMS

Residual Plot for HomoscedasticityConstant Variance

Correct Specification

X

SR

0

Heteroscedasticity

X

SR

0

Fan-Shaped.Standardized Residuals Used.

• Simple Regression– Graphs/Scatter/Simple

• Y-Axis: residual [ res_1 or zre_1 ]• X-Axis: rdexpend [independent variable ]

• Multiple Regression– Graphs/Scatter/Simple

• Y-Axis: residual [ res_1 or zre_1 ]• X-Axis: predicted Y values

Using SPSS to Test for Homoscedasticity of Residuals

Test for Homoscedasticity

Plot of Residuals vs Number

NUMBER

6543210

Res

idua

l

1.5

1.0

.5

0.0

-.5

-1.0

-1.5

DUNTON’S WORLD OF SOUND

Plot of Residuals vs R&D ExpendituresPlot of Residuals vs X Values

RDEXPEND

1614121086420

Res

idua

l

60

40

20

0

-20

-40

Test for Homoscedasticity

ELECTRONIC FIRMS

Scatter Plot of Customer by Residuals

CUSTOMER

11001000900800700600500400

Uns

tand

ardi

zed

Res

idua

l1.0

.5

0.0

-.5

-1.0

-1.5

Residual Plot for Independence

Correct Specification

X

SR

Not Independent

X

SR

Plots Reflect Sequence Data Were Collected.

Two Types of Autocorrelation

• Positive Autocorrelation: successive terms in time series are directly related

• Negative Autocorrelation: successive terms are inversely related

0

20

-20

0 4 8 12 16 20

Residualy - y

Time Period, t

Positive autocorrelation:Residuals tend to be followedby residuals with the same sign

0

20

-20

0 4 8 12 16 20

Residualy - y

Time Period, t

Negative Autocorrelation:Residuals tend to change signsfrom one period to the next

Problems with autocorrelated time-series data

• sy.x and sb are biased downwards• Invalid probability statements about

regression equation and slopes• F and t tests won’t be valid• May imply that cycles exist• May induce a falsely high or low agreement

between 2 variables

Using SPSS to Test for Independence of Errors

• Graphs/Sequence– Variables: residual (res_1)

• Durbin-Watson Statistic

Time Sequence of Residuals

Sequence number

7654321

Res

idua

l

1.5

1.0

.5

0.0

-.5

-1.0

-1.5

DUNTON’S WORLD OF SOUND

Sequence number

151413121110987654321

Time Sequence Plot of ResidualsRe

sidu

al

60

40

20

0

-20

-40

ELECTRONIC FIRMS

794 9799 8837 7855 9845 10844 10863 11875 11880 12905 13886 12843 10904 12950 12841 10

Customers Sales($000)

Customers and sales for period of 15 consecutive weeks.

Residuals over Time

Time

151413121110987654321

Uns

tand

ardi

zed

Res

idua

l2

1

0

-1

-2

-3

Durbin-Watson Procedure• Used to Detect Autocorrelation

– Residuals in One Time Period Are Related to Residuals in Another Period

– Violation of Independence Assumption• Durbin-Watson Test Statistic

D(e e

e

i ii

n

ii

n

12

2

2

1

)

H0 : No positive autocorrelation exists (residuals are random)H1 : Positive autocorrelation exists

Accept Ho if d> du

Reject Ho if d < dL

Inconclusive if dL < d < du

d =

Testing for Positive Autocorrelation

There is positiveautocorrelation

The test isinconclusive

There is no evidence of autocorrelation

0 dL du2 4

Rule of Thumb

• Positive autocorrelation - D will approach 0• No autocorrelation - D will be close to 2• Negative autocorrelation - D is greater than 2

and may approach a maximum of 4

Using SPSS with Autocorrelation

• Analyze/Regression/Linear• Dependent; Independent• Statistics/Durbin-Watson (use only time series

data)

794 9799 8837 7855 9845 10844 10863 11875 11880 12905 13886 12843 10904 12950 12841 10

Customers Sales($000)

Customers and sales for period of 15 consecutive weeks.

Residuals over Time

Time

151413121110987654321

Uns

tand

ardi

zed

Res

idua

l2

1

0

-1

-2

-3

Model Summaryb

.811a .657 .631 .94 .883Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Durbin-Watson

Predictors: (Constant), CUSTOMERa.

Dependent Variable: SALESb.

Durbin-Watson.883

Using SPSS with Autocorrelation

• Analyze/Regression/Linear• Dependent; Independent• Statistics/ Durbin-Watson (use only time series data) • If DW indicates autocorrelation, then …

– Analyze/Time Series/Autoregression– Cochrane-Orcutt– OK

Solutions for autocorrelation• Use Final Parameters under Cochrane-Orcutt• Changes in the dependent and independent variables -

first differences• Transform the variables• Include an independent variable that measures the time of

the observation• Use lagged variables (once lagged value of dependent

variable is introduced as independent variable, Durbon-Watson test is not valid

top related