lowess does not provide p-value or confidence intervals, therefore we cannot make any inference with...

21
LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows you to fit non-linear curve with p-value and confidence intervals. R-tutorial to fit non-linear curve with 2.18.funding.sav data. Open R (download R from http://www.r-project.org if have not done so) Go to Packages, and then select Load packages, load Base and Foreign package We are going to use Dr. Harrell’s libraries, first load Hmisc and Design libraries from MENU Go to Packages, and then select Install packages from CRAN Load Hmisc and Design Delete downloaded files (y/N)? N 2.4 19.4 R-tutorial to fit non-linear curve with 2.18.funding.sav data (1)

Upload: frederick-poole

Post on 22-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows you to fit non-linear curve with p-value and confidence intervals.

R-tutorial to fit non-linear curve with 2.18.funding.sav data.

Open R (download R from http://www.r-project.org if have not done so)

Go to Packages, and then select Load packages, load Base and Foreign package

We are going to use Dr. Harrell’s libraries, first load Hmisc and Design libraries from MENU

Go to Packages, and then select Install packages from CRAN

Load Hmisc and Design

Delete downloaded files (y/N)? N

#R is capital letter sensitive

2.419.4 R-tutorial to fit non-linear curve with 2.18.funding.sav data (1)

Page 2: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

# There is another step in order to complete loading job.

# at the command line “>” type the following

# (you can cut and paste the following commands from here to R):

library(Hmisc)

library(Design)

library(foreign)

library(MASS)

# See what are in there by typing

library(help="Hmisc")

library(help="Design")

# Now look in Hmisc for a command to read SPSS file

help.search("bootstrap") #general search

R-tutorial to fit non-linear curve with 2.18.funding.sav data (2)

Page 3: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

# R can be used as calculator1 + 2

# a に1を代入a<-1# b に 2を代入b<-2a+b

[1] 3

[1] 3

a * b

[1] 2

b ** b

[1] 4

Page 4: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

R-tutorial to fit non-linear curve with 2.18.funding.sav data (3)

#If you want to know instruction how to use spss.get to read SPSS file

?spss.get

#Now let’s read in the dataset (first, you need to move the file to the #directory called c:\\temp, if you want to use the following command, #otherwise, specify the directory you stored the dataset.)

support<-spss.get('c://Rdata//support.sav', lowernames=T )

Page 5: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

[1] "age" "sex" "hospdead" "slos" "d.time" "dzgroup" [7] "dzclass" "num.co" "edu" "income" "scoma" "charges" [13] "totcst" "totmcst" "avtisst" "race" "meanbp" "hrt" [19] "pafi" "bili" "crea" "ph" "wblc" "resp" [25] "temp" "alb" "sod" "glucose" "bun" "urine" [31] "adlp" "adls" "pre.1" "pred.cat" "rand.num" "rrand.nu"[37] "filter.." "pre.2" "death" "years" "year3" "status3"

# List name of variablesnames(support)

Page 6: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

#if you want to know the contents of the datasets, type name of the datasetsupport

age sex hospdead slos d.time dzgroup1 43.53998 female 0 115 2022 ARF/MOSF w/Sepsis2 63.66299 female 1 14 14 ARF/MOSF w/Sepsis3 41.52197 male 1 21 21 MOSF w/Malig 4 89.58795 male 1 4 4 ARF/MOSF w/Sepsis5 67.49097 male 0 24 1951 ARF/MOSF w/Sepsis6 72.83795 male 1 109 109 ARF/MOSF w/Sepsis7 75.36798 male 1 13 13 ARF/MOSF w/Sepsis8 37.71899 male 1 7 7 ARF/MOSF w/Sepsis9 58.95999 female 0 26 1882 ARF/MOSF w/Sepsis10 25.48700 male 1 19 19 ARF/MOSF w/Sepsis11 56.66498 male 1 14 14 ARF/MOSF w/Sepsis12 38.88300 male 0 15 1807 ARF/MOSF w/Sepsis13 66.54596 male 1 45 45 ARF/MOSF w/Sepsis

Page 7: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

R-tutorial to fit non-linear curve with 2.18.funding.sav data (4): R-output from describe function

#describe is tremendously useful to see what is in the dataset describe(support)

42 Variables 1000 Observations-------------------------------------------------------------------------------------------------------------------age : Age n missing unique Mean .05 .10 .25 .50 .75 .90 .95 1000 0 970 62.47 33.76 38.91 51.81 64.90 74.50 81.87 86.00

lowest : 18.04 18.41 19.76 20.30 20.31, highest: 95.51 96.02 96.71 100.13 101.85 -------------------------------------------------------------------------------------------------------------------sex n missing unique 1000 0 2

female (438, 44%), male (562, 56%) -------------------------------------------------------------------------------------------------------------------hospdead n missing unique Sum Mean 1000 0 2 253 0.253 -------------------------------------------------------------------------------------------------------------------slos : Days from study enrollment to hospital discharge n missing unique Mean .05 .10 .25 .50 .75 .90 .95 1000 0 88 17.86 4 4 6 11 20 37 53

lowest : 3 4 5 6 7, highest: 145 164 202 236 241 -------------------------------------------------------------------------------------------------------------------

Page 8: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

# To create nice graphs to describe datadatadensity(support)

Page 9: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

1 2 3 4 5

0e

+0

02

e+

05

4e

+0

5

alb

totc

st

#Scatter plot

plot(totcst~alb, data=support)

Page 10: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

1 2 3 4 5

78

91

01

2

alb

log

(to

tcst

)

#Scatter plot

plot(log(totcst)~alb, data=support)

Page 11: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

# Save a new variable, ln.totcst into support data support$ln.totcst <- log(support$totcst + 1) #Type the next 2 lines before you do any graphical workdd <- datadist(support)options(datadist='dd')

# Lowess curveplsmo(support$alb, support$ln.totcst, datadensity=T)

Page 12: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

# Fitting linear regression

f.linear<-ols(ln.totcst~alb, data=support)

# Show results of the regression

f.linear

ols(formula = ln.totcst ~ alb, data = support)

Frequencies of Missing Values Due to Each Variableln.totcst alb 105 378

n Model L.R. d.f. R2 Sigma 565 66.88 1 0.1116 1.153

Residuals:Total RCC cost Min 1Q Median 3Q Max -9.93285 -0.77134 0.02745 0.74721 3.07144

Coefficients: Value Std. Error t Pr(>|t|)Intercept 11.1960 0.18971 59.017 0.000e+00alb -0.5263 0.06258 -8.411 4.441e-16

Residual standard error: 1.153 on 563 degrees of freedomAdjusted R-Squared: 0.11

R2

P-value

Page 13: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

# Plot result of the linear regressionplot(f.linear, alb=NA)

Page 14: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

# Check normality of residualshist(f.linear$residuals)

Page 15: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

#Fitting non-lineaer linear regressionf.nonlinear<-ols(ln.totcst~rcs(alb,3), data=support)

# Graph the non-linear regressionplot(f.nonlinear, alb=NA)

Page 16: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

#Viewing the result of the regressionanova(f.nonlinear)

Analysis of Variance Response: ln.totcst

Factor d.f. Partial SS MS F P alb 2 96.384725 48.192363 36.29 <.0001 Nonlinear 1 2.322377 2.322377 1.75 0.1865 REGRESSION 2 96.384725 48.192363 36.29 <.0001 ERROR 562 746.263561 1.327871

Overall effect of ALB: p<0.0001 indicates significant effect by ALB

P<0.05 indicates non-linearity

P<0.05 indicates the model is useful

Page 17: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

Box-Cox transformation (1): Finding an optimal choice for power transformation in R

f.linear.crea<-ols(crea~age, data=support)

hist(f.linear.crea$residuals)

anova(f.linear.crea)

plot(f.linear.crea, age=NA)

A useful method so called Box-Cox transformation, will help you to identify the optimal power transformation to achieve normality of residuals. Now we find the best transformation for a regression of Crea= age

Analysis of Variance Table

Response: crea Df Sum Sq Mean Sq F value Pr(>F)age 1 0.13 0.13 0.045 0.832Residuals 995 2915.09 2.93

Page 18: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

f.linear.crea<-lm(crea~age, data=support)bcout<-boxcox(f.linear.crea)bcout$x[bcout$y == max(bcout$y)]

Box-Cox transformation (2): Finding an optimal choice for power transformation in R

[1] -0.5050505 Indicates that you may try transfomation by Y-0.5

# Create a new variable support$crea05<-support$crea**(-0.5050505)

Page 19: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

Box-Cox transformation (3): Finding an optimal choice for power transformation in R

# Re-do linear regression with transformed variable

f.new<-ols(crea05~age, data=support)

# Check residuals again

hist(f.new$residuals)

# Now you can see p-valuesanova(f.new)plot(f.new, age=NA)

Analysis of Variance Response: crea05

Factor d.f. Partial SS MS F P age 1 1.198304 1.19830356 17.38 <.0001 REGRESSION 1 1.198304 1.19830356 17.38 <.0001 ERROR 995 68.590890 0.06893557

Page 20: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

Homework assignment 1

Using Support.sav and use R-software, answer the following questions.

1. Plot non-linear regression slope of log-transformed total cost by serum albumin level with 95% CI for the slope.

(a) Does R2 improve from the analysis of 19.1.1? (b) Does the test of non-linearity for serum albumin level

suggest non-linear effect of serum albumin level?

(c) Is there association between transformed total cost and serum albumin level?

Page 21: LOWESS does not provide p-value or confidence intervals, therefore we cannot make any inference with the non-linear curve from LOWESS. R software allows

2. Plot simple non-linear regression slope of log-transformed total cost by SUPPORT coma score.

Homework assignment 2

(a) Does R2 improve from the analysis of 19.2.1?

(b) Does the test of non-linearity for SUPPORT coma score suggest non-linear effect of SUPPORT coma score?

(c) Is there association between transformed total cost and SUPPORT coma score?

Using Support.sav and use R-software, answer the following questions.