What factors are most responsible for height?
Model Specification
ERROR???measurement errormodel erroranalysis
unexplainedunknownunaccounted formissing variables
Outcome = (Model) + Error
Analytics & History: 1st Regression Line
The first “Regression Line”
Men's average height 'up 11cm since 1870s'
Galton’s Notebook on Families & Height
X1 X2 X3 X4 X5 Y
we find that a 54-loci genomic profile explained 4–6% of the sex- and age-adjusted height variance
the Galtonian mid-parental prediction method explained 40% of the sex- and age-adjusted height variance
> getwd()[1] "C:/Users/johnp_000/Documents"
> setwd()
Dataset Input
Function FilenameObject
Data Types: Numbers and Factors/Categorical
str() summary()
head() summary()
ece
ece
Continuous Categorical
Continuous
Categorical
Histogram
Scatter
Bar
CrossTable
Boxplot
Predictor Variable(X-Axis)
Pie
Child’s Height
Smartphone?Yes or No
Yes No
Outcome, Dependent Variable
(Y-Axis)
Mosaic
CrossTable
LinearRegression
LogisticRegression
Regression Model
Parents Height
Gender
Frequency
01
Outcome, Dependent Variable
(Y-Axis)
Frequency Distribution, Histogramhist(heights$childHeight)
Standard Deviation
0 1 2 3 4 5 6 762
64
66
68
70
72
74
Mean
• Deviation between mean and an actual data point.
Calculating Standard Deviation - sd()
Normal Distribution and SD
Mean = 66.5S.D. = 3.6
66.5 + 7 = 73.6
SD Pct. Z-score Heights
1 90% 1.64
2 95% 1.96 7.06
3 99% 2.58
66.5 - 7 = 59.4
Area = 1
Density Plot
plot(density(h$childHeight))
hist(h$childHeight,freq=F, breaks =25, ylim = c(0,0.14))curve(dnorm(x, mean=mean(h$childHeight), sd=sd(h$childHeight)), col="red", add=T)
Bimodal: two modes
Mode, Bimodal
ggplot2
ggplot2