what factors are most responsible for height? outcome = (model) + error
TRANSCRIPT
![Page 1: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/1.jpg)
What factors are most responsible for height?
Outcome = (Model) + Error
![Page 2: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/2.jpg)
Analytics & History: 1st Regression Line
The first “Regression Line”
![Page 3: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/3.jpg)
Galton’s Notebook on Families & Height
![Page 4: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/4.jpg)
X1 X2 X3 Y
Galton’s Family Height Dataset
![Page 5: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/5.jpg)
> getwd()[1] "C:/Users/johnp_000/Documents"
> setwd()
![Page 6: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/6.jpg)
Dataset Input
Function FilenameObject
h <- read.csv("GaltonFamilies.csv")
![Page 7: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/7.jpg)
str() summary()
Data Types: Numbers and Factors/Categorical
![Page 8: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/8.jpg)
Outline
• One Variable: Univariate• Dependent / Outcome Variable
• Two Variables: Bivariate• Outcome and each Predictor
• All Four Variables: Multivariate
![Page 9: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/9.jpg)
Steps
Continuous
Categorical
Histogram
Scatter
Boxplot
Child’s Height
LinearRegression
Dad’s Height
Gender
ContinuousY
X1, X2
X3
TypeVariable
Mom’s Height
![Page 10: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/10.jpg)
Frequency Distribution, Histogram
hist(h$child)
![Page 11: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/11.jpg)
Area = 1
Density Plot
plot(density(h$childHeight))
![Page 12: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/12.jpg)
hist(h$childHeight,freq=F, breaks =25, ylim = c(0,0.14))curve(dnorm(x, mean=mean(h$childHeight), sd=sd(h$childHeight)), col="red", add=T)
Mode, Bimodal
![Page 13: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/13.jpg)
Industry Pct.Research 24%Higher Education 7%Information Technology 9%Computer Software 7%Financial Services 6%Banking 2%Pharmaceuticals 4%Biotechnology 4%Market Research 3%Management Consulting 3%Total 69%
Hadley Wickham
Asst. Professor of Statistics at Rice University
ggplot2plyrreshaperggobiprofr
Industries / Organizations Creating and Using R
http://ggplot2.org/
![Page 14: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/14.jpg)
ggplot2library(ggplot2)h.gg <- ggplot(h, aes(child)) h.gg + geom_histogram(binwidth = 1 ) + labs(x = "Height", y = "Frequency")h.gg + geom_density()
![Page 15: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/15.jpg)
ggplot2h.gg <- ggplot(h, aes(child)) + theme(legend.position = "right")h.gg + geom_density() + labs(x = "Height", y = "Frequency")h.gg + geom_density(aes(fill=factor(gender)), size=2)
![Page 16: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/16.jpg)
Steps
Continuous
Categorical
Histogram
Scatter
Boxplot
Child’s Height
LinearRegression
Dad’s Height
Gender
ContinuousY
X1, X2
X3
TypeVariable
Mom’s Height
![Page 17: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/17.jpg)
Correlation and Regression
![Page 18: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/18.jpg)
![Page 19: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/19.jpg)
1. Calculate the difference between the mean and each person’s score for the first variable (x).
2. Calculate the difference between the mean and their value for the second variable (y).
3. Multiply these “error” values.4. Add these values to get the cross product deviations.5. The covariance is the average of cross-product deviations
Covariance
1cov( , ) i ix x y y
Nx y
![Page 20: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/20.jpg)
1cov( , ) i ix x y y
Nx y
Covariance
Y
X
Persons 2,3, and 5 look to have similar magnitudes from their means
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
-4-3-2-1012345
![Page 21: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/21.jpg)
254417
441021418221
4)4)(62()2)(60()1)(41()2)(41()3)(40(
1))((
)cov(
.
.....
.....N
yyxxy,x ii
Covariance
• Calculate the error [deviation] between the mean and each subject’s score for the first variable (x).
• Calculate the error [deviation] between the mean and their score for the second variable (y).
• Multiply these error values.• Add these values and you get the cross product deviations.• The covariance is the average cross-product deviations:
![Page 22: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/22.jpg)
• Covariance depends upon the units of measurement• Normalize the data• Divide by the standard deviations of both variables.
• The standardized version of covariance is known as the correlation coefficient
Standardizing the Covariance
![Page 23: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/23.jpg)
Correlation
?cor
cor(h$father, h$child)
0.2660385
![Page 24: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/24.jpg)
Scatterplot Matrix: pairs()
![Page 25: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/25.jpg)
Correlations Matrix library(car) scatterplotMatrix(heights)
![Page 26: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/26.jpg)
ggplot2
![Page 27: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/27.jpg)
Steps
Continuous
Categorical
Histogram
Scatter
Boxplot
Child’s Height
LinearRegression
Dad’s Height
Gender
ContinuousY
X1, X2
X3
TypeVariable
Mom’s Height
![Page 28: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/28.jpg)
Box Plot
![Page 29: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/29.jpg)
Children’s Height vs. Genderboxplot(h$child~gender,data=h, col=(c("pink","lightblue")), main="Children's Height by Gender", xlab="Gender", ylab="")
![Page 30: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/30.jpg)
Descriptive Stats: Box Plot
69.23
64.10
5.13 ======
![Page 31: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/31.jpg)
Subset Malesmen<- subset(h, gender=='male')
![Page 32: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/32.jpg)
Subset Femaleswomen <- subset(h, gender==‘female')
![Page 33: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/33.jpg)
Children’s Height: Males
qqnorm(men$childHeight)qqline(men$childHeight)
hist(men$childHeight)
![Page 34: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/34.jpg)
Children’s Height: Females
qqnorm(women$child)qqline(women$child)
hist(women$child)
![Page 35: What factors are most responsible for height? Outcome = (Model) + Error](https://reader030.vdocuments.site/reader030/viewer/2022032707/56649e175503460f94b03280/html5/thumbnails/35.jpg)
ggplot2 library(ggplot2)h.bb <- ggplot(h, aes(factor(gender), child))h.bb + geom_boxplot()h.bb + geom_boxplot(aes(fill = factor(gender)))