multiple regression. example: brain and body size predictive of intelligence? sample of n = 38...
TRANSCRIPT
Multiple regression
Example: Brain and body size predictive of intelligence?
• Sample of n = 38 college students• Response (Y): intelligence based on the PIQ
(performance) scores from the (revised) Wechsler Adult Intelligence Scale.
• Predictor (X1): Brain size based on MRI scans (given as count/10,000)
• Predictor (X2): Height in inches• Predictor (X3): Weight in pounds
Scatter matrix plots
• Scatter plots of response versus predictor helps in determining nature and strength of relationships.
• Scatter plots of predictor versus predictor helps in studying their relationships, as well as identifying scope of model and outliers.
130.5
91.5
100.728
86.283
73.25
65.75
130.591.5
170.5
127.5
100.72886.2
8373.25
65.75 170.5127.5
PIQ
MRI
Height
Weight
Scatter matrix plot
Matrix plot in Minitab
• Select Graph >> Matrix plot …
• Specify all of the variables (response and predictors) you want graphed.
• Select OK.
Correlation matrix
Correlations: PIQ, MRI, Height, Weight
PIQ MRI HeightMRI 0.378Height -0.093 0.588Weight 0.003 0.513 0.700
Cell Contents: Pearson correlation
Correlation matrix in Minitab
• Stat >> Basic statistics >> Correlation…
• Select all of the variables (response and predictors).
• To get a “crisper” table, de-select default “Display p-values”
Linear regression model with 3 predictors
iiiii XXXY 3322110
where:
• Yi = intelligence (PIQ) if student i
• Xi1 = brain size of student i (MRI)
• Xi2 = height of student i (Height)
• Xi3 = weight of student i (Weight)
Fitting multiple regression model in Minitab
• It’s basically the same as fitting a simple linear regression model.
• Stat >> Regression >> Regression…
• Select response and all predictors.
• Specify all desired options as you would for simple linear regression.
The regression equation isPIQ = 111 + 2.06 MRI - 2.73 Height + 0.001 Weight
Predictor Coef SE Coef T PConstant 111.35 62.97 1.77 0.086MRI 2.0604 0.5634 3.66 0.001Height -2.732 1.229 -2.22 0.033Weight 0.0006 0.1971 0.00 0.998
3
2
1
0
b
b
b
b
b0:
0:
3
30
AH
H
How likely is it that b3 = 0.0006 would be as extreme as it is (?!) if β3 = 0?
Confidence intervals for βk
Sample estimate ± margin of error
kk bspntb
,
21
Predictor Coef SE Coef T PWeight 0.0006 0.1971 0.00 0.998
401.00006.0
1971.00322.20006.0
The regression equation isPIQ = 111 + 2.06 MRI - 2.73 Height
Predictor Coef SE Coef T PConstant 111.28 55.87 1.99 0.054MRI 2.0606 0.5466 3.77 0.001Height -2.7299 0.9932 -2.75 0.009
S = 19.51 R-Sq = 29.5% R-Sq(adj) = 25.5%
Coefficient of (multiple) determination
Adjusted coefficient of (multiple) determination
Coefficient of (multiple) determination
• Basically same as before.
• R2 = SSR/SSTO = proportionate reduction in total variation in Y associated with using set of X1, …, Xp-1 variables.
• Again, a large R2 value does not necessarily imply that the fitted model is a useful one.
Adjusted coefficient of multiple determination
• Problem: adding more X variables can only increase R2, because SSTO never changes for a given set of data.
• But, the remaining error (quantified by SSE) can only get smaller (or stay the same) when more predictor variables are considered.
• Solution: adjust R2 to take into account the number of predictors in the model.
Adjusted coefficient of multiple determination
SSTO
SSE
pn
n
nSSTO
pnSSE
R
1
1
1
)(12
PIQ = 111 + 2.06 MRI - 2.73 Height
S = 19.51 R-Sq = 29.5% R-Sq(adj) = 25.5%
Analysis of VarianceSource DF SS MS F PRegression 2 5572.7 2786.4 7.32 0.002Error 35 13321.8 380.6Total 37 18894.6
Calculation of R2(adj):
Interpretation of R2(adj):
Impact of the adjustmentIt’s a trade-off. R-Sq(adj) may even become smaller when another predictor variable is introduced into the model.
The regression equation isPIQ = 111 + 2.06 MRI - 2.73 Height
S = 19.51 R-Sq = 29.5% R-Sq(adj) = 25.5%
The regression equation isPIQ = 111 + 2.06 MRI - 2.73 Height + 0.001 Weight
S = 19.79 R-Sq = 29.5% R-Sq(adj) = 23.3%
The regression equation isPIQ = 111 + 2.06 MRI - 2.73 Height
Analysis of Variance
Source DF SS MS F PRegression 2 5572.7 2786.4 7.32 0.002Error 35 13321.8 380.6Total 37 18894.6
Is there a relationship between the response variable and the set of predictor variables?
00:
0:
21
210
orH
H
A
How likely is it that the sample would yield such an extreme F-statistic if the null hypothesis were true?
Caution when predicting or estimating response
1009080706050403020100
1200
1000
800
600
400
200
0
age
dura
tion
scope of age
scop
e of
dur
atio
noutside scope of model
130.5
91.5
100.728
86.283
73.25
65.75
130.591.5
170.5
127.5
100.72886.2
8373.25
65.75 170.5127.5
PIQ
MRI
Height
Weight
What is scope of model?
Predicted Values for New Observations
New Obs Fit SE Fit 95.0% CI 95.0% PI1 113.16 3.21 (106.64,119.68) (73.02,153.30) 2 108.99 4.33 (100.19,117.78) (68.41,149.56)
Values of Predictors for New ObservationsNew Obs MRI Height1 91.0 68.02 85.0 65.0 S = 19.51
21.30301.216.113
ˆ,2
1ˆ
hh YspntY
77.190301.216.113
,2
1ˆ
predspntYh
Diagnostics and remedial measures
• Most procedures carry directly over (with minor modification) from simple linear regression to multiple linear regression.
• But, some procedures are specific only to multiple linear regression (chapters 9, 10)
Residuals against each predictor
• Gives an indication of the adequacy of the regression function with respect to each specific predictor variable.
14013012011010090
50
40
30
20
10
0
-10
-20
-30
-40
Fitted Value
Re
sid
ual
Residuals Versus the Fitted Values(response is PIQ)
Unusual ObservationsObs MRI PIQ Fit SEFit Residual StResid 13 86 147.00 95.31 5.34 51.69 2.75R R denotes an obs’n with a large standardized residual
14013012011010090
3
2
1
0
-1
-2
Fitted Value
Sta
ndar
diz
ed
Re
sid
ual
Residuals Versus the Fitted Values(response is PIQ)
1101009080
50
40
30
20
10
0
-10
-20
-30
-40
MRI
Re
sid
ual
Residuals Versus MRI(response is PIQ)
77726762
50
40
30
20
10
0
-10
-20
-30
-40
Height
Re
sid
ual
Residuals Versus Height(response is PIQ)
Residuals versus omitted predictors
• As usual.
• Plus, also consider plotting residuals against interaction terms, such as X1X2, because they too are potentially important omitted variables.
Regression interaction terms in Minitab
• Use the calculator to create a new variable (MRI*Ht).
• Select Calc >> Calculator.
• Specify “Store result in variable” (MRI*Ht)
• Specify Expression: MRI*Height
• Select OK. The new (interaction) predictor variable will appear in worksheet.
8000700060005000
50
40
30
20
10
0
-10
-20
-30
-40
MRI*Ht
Re
sid
ual
Residuals Versus MRI*Ht(response is PIQ)
P-Value (approx): > 0.1000R: 0.9883W-test for Normality
N: 38StDev: 18.9750Average: -0.0000000
50403020100-10-20-30
.999
.99
.95
.80
.50
.20
.05
.01
.001
Pro
babi
lity
RESI1
Normal Probability Plot
Modified Levene test
302010
95% Confidence Intervals for Sigmas
1
2
50403020100-10-20-30-40
Boxplots of Raw Data
RESI1
P-Value : 0.078
Test Statistic: 3.298
Levene's Test
P-Value : 0.037
Test Statistic: 2.762
F-Test
Factor Levels
2
1
Test for Equal Variances for RESI1
MRIGrp
1: le 90.5
2: gt 90.5
LOF Test
• Requires that there are at least some repeats of the same values across all predictor variables.
• X1= 59, X2 = 63 and X1=59 and X2=63 is an example of a repeat.
• X1= 59, X2 = 63 and X1=59 and X2=66 is not an example of a repeat.
Row MRI Height 1 81.69 64.5 2 103.84 73.3 3 96.54 68.8 4 95.15 65.0 5 92.88 69.0 6 99.13 64.5 7 85.43 66.0 8 90.49 66.3 9 95.55 68.8 10 83.39 64.5 11 107.95 70.0 12 92.41 69.0 13 85.65 70.5 14 87.89 66.0 15 86.54 68.0 16 85.22 68.5 17 94.51 73.5 18 80.80 66.3 19 88.91 70.0 20 90.59 76.5
Row MRI Height 21 79.06 62.0 22 95.50 68.0 23 83.18 63.0 24 93.55 72.0 25 79.86 68.0 26 106.25 77.0 27 79.35 63.0 28 86.67 66.5 29 85.78 62.5 30 94.96 67.0 31 99.79 75.5 32 88.00 69.0 33 83.43 66.5 34 94.81 66.5 35 94.94 70.5 36 89.40 64.5 37 93.00 74.0 38 93.59 75.5
Attempted LOF Test
The regression equation isPIQ = 111 + 2.06 MRI - 2.73 Height
Analysis of Variance
Source DF SS MS F PRegression 2 5572.7 2786.4 7.32 0.002Error 35 13321.8 380.6Total 37 18894.6
No replicates. Cannot do pure error test.