linear regression with r 1
TRANSCRIPT
![Page 1: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/1.jpg)
Linear Regressionwith
2012-12-07 @HSPHKazuki Yoshida, M.D. MPH-CLE student
FREEDOMTO KNOW
1: Prepare data/specify model/read results
![Page 2: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/2.jpg)
Group Website is at:
http://rpubs.com/kaz_yos/useR_at_HSPH
![Page 3: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/3.jpg)
n Introduction
n Reading Data into R (1)
n Reading Data into R (2)
n Descriptive, continuous
n Descriptive, categorical
n Deducer
n Graphics
n Groupwise, continuous
n
Previously in this group
![Page 4: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/4.jpg)
Menu
n Linear regression
![Page 5: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/5.jpg)
Ingredients
n Data preparation
n Model formula
n within()
n factor(), relevel()
n lm()
n formula = Y ~ X1 + X2
n summary()
n anova(), car::Anova()
Statistics Programming
![Page 6: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/6.jpg)
Open R Studio
![Page 7: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/7.jpg)
Create a new scriptand save it.
![Page 8: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/8.jpg)
http://www.umass.edu/statdata/statdata/data/
![Page 9: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/9.jpg)
lowbwt.dat
http://www.umass.edu/statdata/statdata/data/lowbwt.txthttp://www.umass.edu/statdata/statdata/data/lowbwt.dat
We will use lowbwt dataset used in BIO213
![Page 10: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/10.jpg)
lbw <- read.table("http://www.umass.edu/statdata/statdata/data/lowbwt.dat", head = T, skip = 4)
Load dataset from web
header = TRUEto pick up
variable names
skip 4 rows
![Page 11: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/11.jpg)
lbw[c(10,39), "BWT"] <- c(2655, 3035)
“Fix” dataset
Replace data pointsto make the dataset identical
to BIO213 dataset10th,39th
rows
BWT column
![Page 12: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/12.jpg)
Lower case variable names
names(lbw) <- tolower(names(lbw))
Convert variable names to lower case
Put them back into variable names
![Page 13: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/13.jpg)
See overview
![Page 14: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/14.jpg)
library(gpairs)gpairs(lbw)
![Page 15: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/15.jpg)
![Page 16: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/16.jpg)
RecodingChanging and creating variables
![Page 17: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/17.jpg)
dataset <- within(dataset, { _variable manipulations_
})
Take datasetName of newly created dataset
(here replacing original)
Perform variable manipulationYou can specify by variable name
only. No need for dataset$var_name
![Page 18: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/18.jpg)
lbw <- within(lbw, {
## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))
## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal")
## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))
})
![Page 19: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/19.jpg)
lbw <- within(lbw, {
## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))
## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal")
## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))
})1 to White2 to Black3 to Other
Categorize race and label:
Numeric to categorical: element by element
1st will be reference
1st will be reference
![Page 20: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/20.jpg)
lbw <- within(lbw, {
## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))
})
factor() to create categorical variable
Take race variable
Order levels 1, 2, 3Make 1 reference level
Label levels 1, 2, 3 as White, Black, Other
Create new variable named
race.cat
Explained more in depth
![Page 21: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/21.jpg)
lbw <- within(lbw, {
## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))
## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal")
## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))
})
-Inf Inf0 1 2 3 4 5 6] ] ](None Normal Many
Numeric to categorical:range to element
1st will be reference
How breaks work
![Page 22: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/22.jpg)
lbw <- within(lbw, {
## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))
## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal")
## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))
})
Reset reference level
Change reference level of ftv.cat variablefrom None to Normal
![Page 23: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/23.jpg)
lbw <- within(lbw, {
## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))
## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal")
## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(FALSE,TRUE), labels = c("0","1+"))
})
Numeric to Boolean to Category
ptl < 1 to FALSE, then to “0”ptl >= 1 to TRUE, then to “1+”
TRUE, FALSE vector created
here levels labels
![Page 24: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/24.jpg)
lbw <- within(lbw, {
## Categorize smoke ht ui smoke <- factor(smoke, levels = 0:1, labels = c("No","Yes")) ht <- factor(ht, levels = 0:1, labels = c("No","Yes")) ui <- factor(ui, levels = 0:1, labels = c("No","Yes"))
})
## Alternative to abovelbw[,c("smoke","ht","ui")] <- lapply(lbw[,c("smoke","ht","ui")], function(var) { var <- factor(var, levels = 0:1, labels = c("No","Yes")) })
Binary 0,1 to No,Yes
One-by-one method
Loop method
![Page 25: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/25.jpg)
model formula
![Page 26: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/26.jpg)
outcome ~ predictor1 + predictor2 + predictor3
formula
SAS equivalent: model outcome = predictor1 predictor2 predictor3;
![Page 27: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/27.jpg)
age ~ zyg
In the case of t-test
continuous variable to be compared
grouping variable to separate groups
Variable to be explained
Variable used to explain
![Page 28: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/28.jpg)
Y ~ X1 + X2
linear sum
![Page 29: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/29.jpg)
n . All variables except for the outcome
n + X2 Add X2 term
n - 1 Remove intercept
n X1:X2 Interaction term between X1 and X2
n X1*X2 Main effects and interaction term
![Page 30: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/30.jpg)
Y ~ X1 + X2 + X1:X2
Interaction term
Main effects Interaction
![Page 31: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/31.jpg)
Y ~ X1 * X2
Interaction term
Main effects & interaction
![Page 32: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/32.jpg)
Y ~ X1 + I(X2 * X3)
On-the-fly variable manipulation
New variable (X2 times X3) created on-the-fly and used
Inhibit formula interpretation. For math
manipulation
![Page 33: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/33.jpg)
lm.full <- lm(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm , data = lbw)
Fit a model
![Page 34: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/34.jpg)
lm.full
See model object
![Page 35: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/35.jpg)
Call: command repeated
Coefficient for each variable
![Page 36: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/36.jpg)
summary(lm.full)
See summary
![Page 37: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/37.jpg)
Call: command repeated
Model F-test
Residual distribution
Dummy variables created
R^2 and adjusted R^2
Coef/SE = t
![Page 38: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/38.jpg)
ftv.catNone No 1st trimester visit people compared to Normal 1st trimester visit people (reference level)
ftv.catMany Many 1st trimester visit people compared to Normal 1st trimester visit people (reference level)
![Page 39: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/39.jpg)
race.catBlack Black people compared to White people (reference level)
race.catOther Other people compared to White people (reference level)
![Page 40: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/40.jpg)
confint(fit.lm)
Confidence intervals
![Page 41: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/41.jpg)
Lower boundary
Upper boundary
Confidence intervals
![Page 42: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/42.jpg)
anova(lm.full)
ANOVA table (type I)
![Page 43: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/43.jpg)
degree of freedom
Sequential SS
Mean SS = SS/DF
F = Mean SS / Mean SS of residual
ANOVA table (type I)
![Page 44: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/44.jpg)
1 age
2 lwt
3 smoke
1st gets all in type I
2nd gets all but overlap
between 1 in type Ilast remaining
only in type I
Type I = Sequential SS
![Page 45: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/45.jpg)
library(car)Anova(lm.full, type = 3)
ANOVA table (type III)
![Page 46: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/46.jpg)
degree of freedom
Marginal SS
F = Mean SS / Mean SS of residual
ANOVA table (type III)
Multi-category variables tested as
one
![Page 47: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/47.jpg)
1 age
2 lwt
3 smoke
1st gets margin
only in type III
2nd
gets
margin
only
in ty
pe II
I
last gets margin
only in type III
Type III = Marginal SS
![Page 48: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/48.jpg)
Type I Type III
Comparison
![Page 49: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/49.jpg)
library(effects)plot(allEffects(lm.full), ylim = c(2000,4000))
Effect plot
Fix Y-axis values for all
plots
![Page 50: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/50.jpg)
Effect of a variable with other covariate
set at average
![Page 51: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/51.jpg)
Interaction
![Page 52: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/52.jpg)
lm.full.int <- lm(bwt ~ age*lwt + smoke + ht + ui + age*ftv.cat + race.cat*preterm, data = lbw)
Continuous * Continuous
Categorical * CategoricalContinuous * Categorical
This model is for demonstration purpose.
![Page 53: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/53.jpg)
Anova(lm.full.int, type = 3)
![Page 54: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/54.jpg)
degree of freedom
Marginal SS
F = Mean SS / Mean SS of residual
Interactionterms
![Page 55: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/55.jpg)
plot(effect("age:lwt", lm.full.int))
lwt level
Con
tinuo
us *
Con
tinuo
us
![Page 56: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/56.jpg)
plot(effect("age:ftv.cat", lm.full.int), multiline = TRUE)C
ontin
uous
* C
ateg
oric
al
![Page 57: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/57.jpg)
Cat
egor
ical
* C
ateg
oric
alplot(effect(c("race.cat*preterm"), lm.full.int),
x.var = "preterm", z.var = "race.cat", multiline = TRUE)
![Page 58: Linear regression with R 1](https://reader037.vdocuments.site/reader037/viewer/2022103001/558501e0d8b42ad71b8b4e5d/html5/thumbnails/58.jpg)