topic 23: diagnostics and remedies. outline diagnostics –residual checks anova remedial measures

49
Topic 23: Diagnostics and Remedies

Upload: paul-bryan

Post on 05-Jan-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Topic 23: Diagnostics and Remedies

Page 2: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Outline

• Diagnostics

– residual checks

• ANOVA remedial measures

Page 3: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Diagnostics Overview

• We will take the diagnostics and remedial measures that we learned for regression and adapt them to the ANOVA setting

• Many things are essentially the same

• Some things require modification

Page 4: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Residuals

• Predicted values are cell means, =

• Residuals are the differences between the observed values and the cell means Yij-

ijYi.Y

i.Y

Page 5: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Basic plots

• Plot the data vs the factor levels (the values of the explanatory variables)

• Plot the residuals vs the factor levels

• Construct a normal quantile plot and/or histogram of the residuals

Page 6: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

KNNL Example

• KNNL p 777

• Compare 4 brands of rust inhibitor (X has r=4 levels)

• Response variable is a measure of the effectiveness of the inhibitor

• There are 10 units per brand (n=10)

Page 7: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Plots

• Data versus the factor

• Residuals versus the factor

• Normal quantile plot of the residuals

Page 8: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Plots vs the factor

symbol1 v=circle i=none;proc gplot data=a2; plot (eff resid)*abrand;run;

Page 9: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Data vs the factor

Means look different …common spread in Y’s

Page 10: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Residuals vs the factorOdd dist of points

Page 11: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

QQ-plot

Due to odd (lack of and large)spread

Can try nonparametric analysis – last slides

Page 12: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

General Summary

• Look for

–Outliers

–Variance that depends on level

–Non-normal errors

• Plot residuals vs time and other variables if available

Page 13: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Homogeneity tests

• Homogeneity of variance (homoscedasticity)

• H0: σ12 = σ2

2 = … = σr2

• H1: not all σi2 are equal

• Several significance tests are available

Page 14: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Homogeneity tests

• Text discusses Hartley, modified Levene

• SAS has several including Bartlett’s (essentially the likelihood ratio test) and several versions of Levene

Page 15: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Homogeneity tests

• There is a problem with assumptions–ANOVA is robust with respect to

moderate deviations from Normality–ANOVA results can be sensitive to

the homogeneity of variance assumption

• Some homogeneity tests are sensitive to the Normality assumption

Page 16: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Levene’s Test

• Do ANOVA on the squared residuals from the original ANOVA

• Modified Levene’s test uses absolute values of the residuals

• Modified Levene’s test is recommended• Another quick and dirty rule of thumb

2 2max( ) / min( ) 2i is s

Page 17: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

KNNL Example

• KNNL p 785• Compare the strengths of 5 types

of solder flux (X has r=5 levels)• Response variable is the pull

strength, force in pounds required to break the joint

• There are 8 solder joints per flux (n=8)

Page 18: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Scatterplot

Page 19: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Levene’s Test

proc glm data=a1; class type; model strength=type; means type/ hovtest=levene(type=abs); run;

Page 20: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

ANOVA Table

Source DFSum of

SquaresMean

Square F Value Pr > FModel 4 353.612085 88.4030213 41.93 <.0001Error 35 73.7988250 2.1085379Corrected Total

39 427.410910

Common variance estimated to be 2.11

Page 21: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Output

Levene's TestANOVA of Absolute Deviations Source DF F Value Pr > Ftype 4 3.07 0.0288Error 35

We reject the null hypothesis and assume nonconstant variance

Page 22: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Means and SDs

Level strengthtype N Mean Std Dev1 8 15.42 1.232 8 18.52 1.253 8 15.00 2.484 8 9.74 0.815 8 12.34 0.76

Page 23: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Remedies

• Delete outliers

– Is their removal important?• Use weights (weighted regression)

• Transformations

• Nonparametric procedures

Page 24: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

What to do here?

• Not really any obvious outliers

• Do not see pattern of increasing or decreasing variance or skewed dists

• Will consider

–Weighted ANOVA

–Mixed model ANOVA

Page 25: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Weighted least squares

• We used this with regression–Obtain model for how the sd

depends on the explanatory variable (plotted absolute value of residual vs x)–Then used weights inversely

proportional to the estimated variance

Page 26: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Weighted Least Squares

• Here we can compute the variance for each level

• Use these as weights in PROC GLM

• We will illustrate with the soldering example from KNNL

Page 27: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Obtain the variances and weights

proc means data=a1; var strength; by type; output out=a2 var=s2;data a2; set a2; wt=1/s2;

NOTE. Data set a2 has 5 cases

Page 28: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Proc Means Output

Level oftype N

strength

Mean Std Dev1 8 15.4200000 1.23713956

2 8 18.5275000 1.25297076

3 8 15.0037500 2.48664397

4 8 9.7412500 0.81660337

5 8 12.3400000 0.76941536

Page 29: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Merge and then use the weights in PROC GLM

data a3; merge a1 a2; by type; proc glm data=a3; class type; model strength=type; weight wt; lsmeans type / cl; run;

Page 30: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Output

Source DFSum of

Squares Mean Square F Value Pr > FModel 4 324.213099 81.0532747 81.05 <.0001

Error 35 35.0000000 1.0000000

Corrected Total 39 359.213099

Data have been standardized to have a variance of 1

Page 31: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

LSMEANS Output

typestrength

LSMEANStandard

Error Pr > |t|95% Confidence

Limits1 15.4200000 0.4373949 <.0001 14.53204

116.30795

92 18.5275000 0.4429921 <.0001 17.62817

819.42682

23 15.0037500 0.8791614 <.0001 13.21895

716.78854

34 9.7412500 0.2887129 <.0001 9.155132 10.32736

85 12.3400000 0.2720294 <.0001 11.78775

112.89224

9

Because of weights, standard errors simply based on sample variances of each level

Page 32: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Mixed Model ANOVA

• Relax the assumption of constant variance rather than including a “known” weight

• This involves moving to a mixed model procedure

• Topic will not be on exam but wanted you to be aware of these model capabilities

Page 33: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

SAS Codeproc glimmix data=a1;

class type;

model strength=type / ddfm=kr;

random residual / group=type;

run;This allows the variance to differ in each level and a degrees of freedom adjustment is used to account for this

Page 34: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

GLIMMIX OUTPUTFit Statistics

-2 Res Log Likelihood 122.11AIC (smaller is better) 132.11AICC (smaller is better) 134.18BIC (smaller is better) 139.88CAIC (smaller is better) 144.88HQIC (smaller is better) 134.79Generalized Chi-Square 35.00Gener. Chi-Square / DF 1.00

Covariance Parameter Estimates

Cov Parm Group EstimateStandard

ErrorResidual (VC) type 1 1.5305 0.8181Residual (VC) type 2 1.5699 0.8392Residual (VC) type 3 6.1834 3.3052Residual (VC) type 4 0.6668 0.3564Residual (VC) type 5 0.5920 0.3164

Type III Tests of Fixed Effects

EffectNum

DFDen DF F Value Pr > F

type 4 14.81 71.78 <.0001

Really 3 groups of variances

Page 35: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

SAS Codeproc glimmix data=a1;

class type;

model strength=type / ddfm=kr;

random residual / group=type1;

run;Type1 was created to identify Type 1 and 2, Type 3, and Type

4 and 5 as 3 groups

Page 36: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

GLIMMIX OUTPUTFit Statistics

-2 Res Log Likelihood 122.13AIC (smaller is better) 128.13AICC (smaller is better) 128.91BIC (smaller is better) 132.80CAIC (smaller is better) 135.80HQIC (smaller is better) 129.74Generalized Chi-Square 35.00Gener. Chi-Square / DF 1.00

Covariance Parameter Estimates

Cov Parm Group EstimateStandard

ErrorResidual (VC) Grp 1 1.5502 0.5859Residual (VC) Grp 2 6.1834 3.3052Residual (VC) Grp 3 0.6294 0.2379

Type III Tests of Fixed Effects

EffectNum

DFDen DF F Value Pr > F

type 4 19.8 77.68 <.0001

Better BIC but same general type conclusion

Page 37: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Transformation Guides

• When σi2 is proportional to μi, use

• When σi is proportional to μi, use log(y)

• When σi is proportional to μi2, use 1/y

• For proportions, use arcsin( )–arsin(sqrt(y)) in a SAS data step

• Box-Cox transformation

Y

Y

Page 38: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Example

• Consider study on KNNL pg 790

• Y: time between computer failures

• X: three locations data a3;

infile 'u:\.www\datasets512\CH18TA05.txt'; input time location interval; symbol1 v=circle; proc gplot; plot time*location; run;

Page 39: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Scatterplot

Outlier or skewed distribution? Can consider transformation first

Page 40: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Box-Cox Transformation

• Can consider regression

and 1-b1 is the power to raise Y by

• Can try various “convenient” powers

• Can use SAS directly to calculate the power

log( ) vs log( )i is y

Page 41: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

E(logsig) = 0.90 + .79 logmu

Power should be 1-.79 ≈ 0.20

Page 42: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Using SAS

proc transreg data=a3;

model boxcox(time / lambda=-2 to 2

by .2) = class(location);

run;

Page 43: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

OutputBox-Cox Transformation Information for time

Lambda R-Square Log Like-1.0 0.24 -73.040-0.8 0.27 -67.330-0.6 0.31 -62.316-0.5 0.33 -60.144-0.4 0.35 -58.239-0.2 0.38 -55.346 *0.0 + 0.39 -53.830 *0.2 0.38 -53.769 <0.4 0.36 -55.118 *0.5 0.34 -56.2730.6 0.32 -57.7120.8 0.29 -61.3141.0 0.25 -65.675

< - Best Lambda* - 95% Confidence Interval+ - Convenient Lambda

Page 44: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Transforming data in SAS

data a3;

set a3;

transtime = time**0.20;

symbol1 v=circle i=none;

proc gplot;

plot transtime*location;

run;

Page 45: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Much more constant spread in data!

Page 46: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Nonparametric approach

• Based on ranks

• See KNNL section 18.7, p 795

• See the SAS procedure NPAR1WAY

Page 47: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Rust Inhibitor Analysis

Source DFSum of

Squares Mean Square F Value Pr > FModel 3 15953.4660 5317.82200 866.12 <.0001Error 36 221.03400 6.13983Corrected Total 39 16174.5000

Highly significant F test. Even if there is a violation of Normality, the evidence is

overwhelming

Page 48: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Nonparametric AnalysisWilcoxon Scores (Rank Sums) for Variable eff

Classified by Variable abrand

abrand NSum ofScores

ExpectedUnder H0

Std DevUnder H0

MeanScore

1 10 128.0 205.0 32.014119 12.802 10 355.0 205.0 32.014119 35.503 10 255.0 205.0 32.014119 25.504 10 82.0 205.0 32.014119 8.20

Average scores were used for ties.

Kruskal-Wallis TestChi-Square 33.7041

DF 3

Pr > Chi-Square <.0001

Page 49: Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures

Last slide

• We’ve finished most of Chapters 17 and 18.

• We used program topic23.sas to generate the output.