lecture 10.1.key

Upload: peter-xie

Post on 19-Feb-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/23/2019 Lecture 10.1.Key

    1/26

    Transforms Revisited

    Transforms are used to

    Change the mean function so that it islinear.

    Adjust for non-constant variance problem Fix non-Normal residuals Although you won't always solve all three

    problems (or any problem for that matter.)

  • 7/23/2019 Lecture 10.1.Key

    2/26

    Youve already studied

    log transforms and square-root transforms

    Now

    were going to consider a more generalclass of transforms and discuss strategiesfor finding the best transform

  • 7/23/2019 Lecture 10.1.Key

    3/26

    Strategy

    First, transform Y. If that doesn't work, transform the

    predictors, but not Y.

    If that improves things but not perfectly, seeif you can now transform Y.

    There are also approaches that considertransforming ALL variables simultaneously.

    Keep in mind

    Don't remove outliers, influential points,etc. until the transforming is done. These points might not really be so outlying

    once the transform is done.

  • 7/23/2019 Lecture 10.1.Key

    4/26

    Keep in Mind

    Simple is better than complicated If you are expected to interpret the

    parameters, then transformations mightmake this impossible.

    Transform Y

    E(Y|X)6=0+ 1x1+ . . .+ pxp

    Basic idea: What if

    but instead:

    E(Y|X) = g(0+ 1x1+ . . . + pxp)

    so we need to discover g()

  • 7/23/2019 Lecture 10.1.Key

    5/26

    E(Y|X) = g(0+ 1x1+ . . . + pxp)

    if we knew g(), we could invert it:

    g1(E(Y|X)) = g1(g(0+ 1x1+ . . . + pxp))

    Ynew = 0 + 1x1 + . . .+ pxp

    Transform Y: 2 approaches

    Inverse Response Plots Box-Cox Method

  • 7/23/2019 Lecture 10.1.Key

    6/26

    Inverse Response Plots

    a technique for guessing g()

    If the predictors have an elliptically symmetricdistribution (so joint Normal is one example of this), then

    plot y-hat against y.

    The shape of the resulting curve gives you an idea as to the

    shape of g inverse.

  • 7/23/2019 Lecture 10.1.Key

    7/26

    > m1=lm(ozone~temperature+pressure,data=ozonetext)> plot(m1)

    A plot of the predictors show that their joint distributionis roughly elliptical.

  • 7/23/2019 Lecture 10.1.Key

    8/26

    > library(alr3)

    > invResPlot(m2)

    lambda RSS1 0.3658881 1989.771

    2 -1.0000000 3412.9123 0.0000000 2082.3774 1.0000000 2196.992

    Suggests that the best transform is Ynew = Y0.365881

    (lambda=0 refers to the log transform)

    Note log transform isnt to different from optimal

  • 7/23/2019 Lecture 10.1.Key

    9/26

    > ozone.t1=transform(ozonetext,ozone.t = ozone^(.37) )> m2=lm(ozone.t~temperature+pressure,data=ozone.t1)> plot(m2) transformed

    original

  • 7/23/2019 Lecture 10.1.Key

    10/26

    transform

    original

    transformed

    original

  • 7/23/2019 Lecture 10.1.Key

    11/26

    On the whole, the transformationimproved the validity of the model.

    But interpretation may now be quitedifficult.

    Still, improved validity means we bettertrust p-values and confidence intervals andprediction intervals.

    > summary(m2)

    Call:lm(formula = ozone.t ~ temperature + pressure, data = ozone.t1)

    Coefficients: Estimate Std. Error t value Pr(>|t|)(Intercept) -0.4004629 0.1774149 -2.257 0.0256 *temperature 0.0423812 0.0027663 15.321

  • 7/23/2019 Lecture 10.1.Key

    12/26

    Another approach:

    (Useful when the distribution of the variable to betransformed is not Normal.)

    Box-Cox

    Choose a transform of Y, (Y)

    such that distribution of Y is closer to Normalwhere

    (Y) = gm(Y)1(Y 1)/

    (Y) =gm(Y)log(Y) for = 0

    (gmis the geometric mean)

  • 7/23/2019 Lecture 10.1.Key

    13/26

    (Y) = gm(Y)1(Y 1)/

    gm(Y) is the geometric mean of y =

    ni=1Y1/ni

    To find lambda....

    maximum likelihood estimation of lambda.

    > library(MASS)

    > boxcox(m1)

    or

    > library(alr3)

    >summary(powerTransform(y~x1+x2,data=))

  • 7/23/2019 Lecture 10.1.Key

    14/26

    1/3

    which confirmsour previous

    transformationusing lambda = .37

    > boxcox(m1)

    > summary(powerTransform(m1))

    bcPower Transformation to Normality

    Est.Power Std.Err. Wald Lower Bound Wald Upper Bound

    Y1 0.2343 0.0866 0.0646 0.4041

    Likelihood ratio tests about transformation parameters

    LRT df pval

    LR test, lambda = (0) 7.568201 1 5.940706e-03

    LR test, lambda = (1) 66.558671 1 3.330669e-16

    In fact, optimal transform is .23, which is smaller thanprevious .37. However, .37 is within the confidence interval

    of 0.0646 to 0.4041

  • 7/23/2019 Lecture 10.1.Key

    15/26

    Likelihood ratio tests about transformation parameters

    LRT df pval

    LR test, lambda = (0) 7.568201 1 5.940706e-03

    LR test, lambda = (1) 66.558671 1 3.330669e-16

    Null: lambda=0Alt: lambda 0

    Small p-value, so we reject.Thus, it is best to notdo a

    log transform.

    Null: no transform (lambda=1)Alt: do a transform

    Reject. We need a transform.

    Transform Predictors

    You can use BoxCox to transformpredictors when Y is NOT transformed

    Then, if necessary, use inverse responseplot to transform Y

  • 7/23/2019 Lecture 10.1.Key

    16/26

    In this approach, we find a transformationthat makes the joint distribution of all the

    predictorsmultivariate Normal.

    (or as close to it as we can get)

    once thats done, we try to find atransform for Y.

    Then we see if it helps.

  • 7/23/2019 Lecture 10.1.Key

    17/26

  • 7/23/2019 Lecture 10.1.Key

    18/26

    o these predictors look like they come from a Normaldistribution?

    (probably not)

    >library(alr3)

    > summary(powerTransform(ozone~temperature+height,data=o2.mini))

    box.cox Transformations to Multinormality

    Est.Power Std.Err. Wald(Power=0) Wald(Power=1)

    temperature 1.1383 0.3246 3.5070 0.426height 18.9126 4.5176 4.1864 3.965

    LRT df p.valueLR test, all lambda equal 0 25.50600 2 2.893633e-06

    LR test, all lambda equal 1 17.30179 2 1.749703e-04

    Best lambda could be within two Std. Errors of Estimated.

    For temp, use a lambda between 0.5 to 1.7, roundinggenerously.

  • 7/23/2019 Lecture 10.1.Key

    19/26

    > summary(powerTransform(cbind(o2.mini$temperature, o2.mini

    $height,data=o2.mini)~1)

    box.cox Transformations to Multinormality

    Est.Power Std.Err. Wald(Power=0) Wald(Power=1)

    temperature 1.1383 0.3246 3.5070 0.426height 18.9126 4.5176 4.1864 3.965

    LRT df p.valueLR test, all lambda equal 0 25.50600 2 2.893633e-06

    LR test, all lambda equal 1 17.30179 2 1.749703e-04

    Temp: try square-root transform or no transform

    Height: Transform to a high power, which is very unusual

    and probably not helpful. But let's try the 20th poweranyways.

    > o2.minit=transform(o2.mini,temp.t = sqrt(temperature),height.t =height^20)> plot(o2.minit)

  • 7/23/2019 Lecture 10.1.Key

    20/26

    > o2.minit=transform(o2.mini,temp.t = sqrt(temperature),height.t = height^20)> plot(o2.minit)

    residuals: no transform

  • 7/23/2019 Lecture 10.1.Key

    21/26

    not much better, so look at transforming

    Y

    transformedpredictors > m.t1 = lm(ozone~temp.t+height.t,data=o2.minit)

    > plot(m.t1)

    > invResPlot(m.t1)

  • 7/23/2019 Lecture 10.1.Key

    22/26

    once again,

    Y Y

    1/3

    looks best.> o2.minit2 = transform(o2.minit,ozone.t =ozone^(1/3))> m.t2 = lm(ozone.t~temp.t

    +height.t,data=o2.minit2)

    > plot(m.t2)

  • 7/23/2019 Lecture 10.1.Key

    23/26

    A third approach is to use boxcox totransform the predictors and the response

    simultaneously

  • 7/23/2019 Lecture 10.1.Key

    24/26

    Use BoxCox to transform ALL at once.>

    summary(powerTransform(with(o2.mini,cbind(ozone,height,temperature))

    )box.cox Transformations to Multinormality

    Est.Power Std.Err. Wald(Power=0) Wald(Power=1)ozone 0.2503 0.0888 2.8178 -8.4416

    height 18.8959 4.4542 4.2422 4.0177

    temperature 1.1590 0.2661 4.3550 0.5976

    LRT df p.valueLR test, all lambda equal 0 37.03313 3 4.527709e-08

    LR test, all lambda equal 1 83.53574 3 0.000000e+00

    This is consistent with the 1/3 power of ozone, a 20th power forheight, and no change (raise to the 1 power) for temp.

  • 7/23/2019 Lecture 10.1.Key

    25/26

  • 7/23/2019 Lecture 10.1.Key

    26/26

    2 (p+ 1)/n= 2 3/141 = 0.04 = "big" leverage