regresi dan analisis varians pertemuan 21 matakuliah: i0174 – analisis regresi tahun: ganjil...
Post on 19-Dec-2015
222 views
TRANSCRIPT
Regresi dan Analisis VariansPertemuan 21
Matakuliah : I0174 – Analisis RegresiTahun : Ganjil 2007/2008
Bina Nusantara
Regresi dan Analisis Varians
• Model Analisis Varians Eka Arah
• Pendekatan Regresi terhadap Klasifikasi satu arah
Bina Nusantara
Population Y-intercept
Population slopes Random error
The Multiple Regression Model
Relationship between 1 dependent & 2 or more independent variables is a linear
function
Dependent (Response) variable
Independent (Explanatory) variables
1 2i i i k ki iY X X X
Bina Nusantara
Multiple Regression Model
X2
Y
X1Y|X = 0 + 1X 1i + 2X 2i
0
Y i = 0 + 1X 1i + 2X 2i + i
ResponsePlane
(X 1i,X 2i)
(O bserved Y )
i
X2
Y
X1Y|X = 0 + 1X 1i + 2X 2i
0
Y i = 0 + 1X 1i + 2X 2i + i
ResponsePlane
(X 1i,X 2i)
(O bserved Y )
i
Bivariate model
1X
Y
2X
0 1 1 2 2i i i iY X X (Observed )Y
| 0 1 1 2 2Y X i iX X
Response
Plane0
1 2,i iX X
Bina Nusantara
Multiple Regression Equation
X2
Y
X1
b0
Y i = b0 + b1X 1 i + b2X 2 i + e i
ResponsePlane
(X 1i, X 2i)
(O bserved Y)
^
e i
Y i = b0 + b1X 1 i + b2X 2 i
X2
Y
X1
b0
Y i = b0 + b1X 1 i + b2X 2 i + e i
ResponsePlane
(X 1i, X 2i)
(O bserved Y)
^
e i
Y i = b0 + b1X 1 i + b2X 2 i
Bivariate model 0 1 1 2 2i i i iY b b X b X e Y
1X
2X
(Observed )Y
Response
Plane
1 2,i iX X
0b
0 1 1 2 2i i iY b b X b X Multiple Regression EquationMultiple Regression Equation
Bina Nusantara
Multiple Regression Equation
Too complicated
by hand! Ouch!
Bina Nusantara
Interpretation of Estimated Coefficients
• Slope (bj )
– Estimated that the average value of Y changes by bj for each 1 unit increase in Xj , holding all other variables constant (ceterus paribus)
– Example: If b1 = -2, then fuel oil usage (Y) is expected to decrease by an estimated 2 gallons for each 1 degree increase in temperature (X1), given the inches of insulation (X2)
• Y-Intercept (b0)
– The estimated average value of Y when all Xj = 0
Bina Nusantara
Multiple Regression Model: ExampleOil (Gal) Temp Insulation
275.30 40 3363.80 27 3164.30 40 1040.80 73 694.30 64 6
230.90 34 6366.70 9 6300.60 8 10237.80 23 10121.40 63 331.40 65 10
203.50 41 6441.10 21 3323.00 38 352.50 58 10
(0F)
Develop a model for estimating heating oil used for a single family home in the month of January, based on average temperature and amount of insulation in inches.
Bina Nusantara
1 2ˆ 562.151 5.437 20.012i i iY X X
Multiple Regression Equation: Example
CoefficientsIntercept 562.1510092X Variable 1 -5.436580588X Variable 2 -20.01232067
Excel Output
For each degree increase in temperature, the estimated average amount of heating oil used is decreased by 5.437 gallons, holding insulation constant.
For each increase in one inch of insulation, the estimated average use of heating oil is decreased by 20.012 gallons, holding temperature constant.
0 1 1 2 2i i i k kiY b b X b X b X
Bina Nusantara
Multiple Regression in PHStat
• PHStat | Regression | Multiple Regression …• Excel spreadsheet for the heating oil example
Microsoft Excel Worksheet
Bina Nusantara
Venn Diagrams and Explanatory Power of Regression
Oil
Temp
Variations in Oil explained by Temp or variations in Temp used in explaining variation in Oil
Variations in Oil explained by the error term
Variations in Temp not used in explaining variation in Oil
SSE
SSR
Bina Nusantara
Venn Diagrams and Explanatory Power of Regression
Oil
Temp
2
r
SSR
SSR SSE
(continued)
Bina Nusantara
Venn Diagrams and Explanatory Power of Regression
Oil
TempInsulation
Overlapping Overlapping variation in both Temp and Insulation are used in explaining the variationvariation in Oil but NOTNOT in the estimationestimation of nor
12
Variation NOTNOT explained by Temp nor Insulation SSE
Bina Nusantara
Coefficient of Multiple Determination
• Proportion of Total Variation in Y Explained by All X Variables Taken Together
–
• Never Decreases When a New X Variable is Added to Model– Disadvantage when comparing among models
212
Explained Variation
Total VariationY k
SSRr
SST
Bina Nusantara
Venn Diagrams and Explanatory Power of Regression
Oil
TempInsulation
212
Yr
SSR
SSR SSE
Bina Nusantara
Adjusted Coefficient of Multiple Determination
• Proportion of Variation in Y Explained by All the X Variables Adjusted for the Sample Size and the Number of X Variables Used–
– Penalizes excessive use of independent variables– Smaller than– Useful in comparing among models– Can decrease if an insignificant new X variable is added to
the model
2 212
11 1
1adj Y k
nr r
n k
212Y kr
Bina Nusantara
Example: Adjusted r2
Can Decrease
Regression StatisticsMultiple R 0.982654757R Square 0.965610371Adjusted R Square 0.959878766Standard Error 26.01378323Observations 15
0 1 2Oil Temp Insulation
0 1 2 3Oil Temp Insulation Color
Regression StatisticsMultiple R 0.983482856R Square 0.967238528Adjusted R Square 0.958303581Standard Error 25.72417272Observations 15
Adjusted r 2 decreases when k increases from 2 to 3
Color is not useful in explaining the variation in oil consumption.
Bina Nusantara
Using the Regression Equation to Make Predictions
Predict the amount of heating oil used for a home if the average temperature is 300 and the insulation is 6 inches.
The predicted heating oil used is 278.97 gallons.
1 2
ˆ 562.151 5.437 20.012
562.151 5.437 30 20.012 6
278.969
i i iY X X
Bina Nusantara
Testing for Overall Significance
• Test Statistic:
–
• Where F has k numerator and (n-k-1) denominator degrees of freedom
(continued)
all /
all
SSR kMSRF
MSE MSE
Bina Nusantara
ANOVAdf SS MS F Significance F
Regression 2 228014.6 114007.3 168.4712 1.65411E-09Residual 12 8120.603 676.7169Total 14 236135.2
Test for Overall SignificanceExcel Output: Example
k = 2, the number of explanatory variables n - 1
p-value
Test StatisticMSR
FMSE
Bina Nusantara
Test for Overall Significance:Example Solution
F0 3.89
H0: 1 = 2 = … = k = 0
H1: At least one j 0 = .05df = 2 and 12
Critical Value:
Test Statistic:
Decision:
Conclusion:
Reject at = 0.05.
There is evidence that at least one independent variable affects Y.
= 0.05
F 168.47(Excel Output)
Bina Nusantara
Test for Significance:Individual Variables
• Show If Y Depends Linearly on a Single Xj Individually While Holding the Effects of Other X’s Fixed
• Use t Test Statistic• Hypotheses:
– H0: j 0 (No linear relationship)
– H1: j 0 (Linear relationship between Xj and Y)
Bina Nusantara
Coefficients Standard Error t Stat P-valueIntercept 562.1510092 21.09310433 26.65094 4.77868E-12Temp -5.436580588 0.336216167 -16.1699 1.64178E-09Insulation -20.01232067 2.342505227 -8.543127 1.90731E-06
t Test StatisticExcel Output: Example
t Test Statistic for X1 (Temperature)
t Test Statistic for X2 (Insulation)
i
i
b
bt
S
Bina Nusantara
t Test : Example Solution
H0: 1 = 0
H1: 1 0
df = 12
Critical Values:
Test Statistic:
Decision:
Conclusion:
Reject H0 at = 0.05.
There is evidence of a significant effect of temperature on oil consumption holding constant the effect of insulation.
t0 2.1788-2.1788
.025
Reject H0 Reject H0
.025
Does temperature have a significant effect on monthly consumption of heating oil? Test at = 0.05.
t Test Statistic = -16.1699
Bina Nusantara
Venn Diagrams and Estimation of Regression Model
Oil
TempInsulation
Only this information is used in the estimation of 2
Only this information is used in the estimation of
1This information is NOT used in the estimation of nor1 2
Bina Nusantara
Confidence Interval Estimate for the Slope
Provide the 95% confidence interval for the population slope 1 (the effect of temperature on oil consumption).
11 1n p bb t S
Coefficients Lower 95% Upper 95%Intercept 562.151009 516.1930837 608.108935Temp -5.4365806 -6.169132673 -4.7040285Insulation -20.012321 -25.11620102 -14.90844
-6.169 1 -4.704
We are 95% confident that the estimated average consumption of oil is reduced by between 4.7 gallons to 6.17 gallons per each increase of 10 F holding insulation constant.
We can also perform the test for the significance of individual variables, H0: 1 = 0 vs. H1: 1 0, using this confidence interval.
Bina Nusantara
Contribution of a SingleIndependent Variable
• Let Xj Be the Independent Variable of Interest
•
– Measures the additional contribution of Xj in explaining the total variation in Y with the inclusion of all the remaining independent variables
jX
| all others except
all all others except
j j
j
SSR X X
SSR SSR X
Bina Nusantara
Contribution of a Single Independent Variable kX
1 2 3
1 2 3 2 3
| and
, and and
SSR X X X
SSR X X X SSR X X
Measures the additional contribution of X1 in explaining Y with the inclusion of X2 and X3.
From ANOVA section of regression for
From ANOVA section of regression for
0 1 1 2 2 3 3i i i iY b b X b X b X 0 2 2 3 3i i iY b b X b X
Bina Nusantara
Coefficient of Partial Determination of
•
• Measures the proportion of variation in the dependent variable that is explained by Xj while controlling for (holding constant) the other independent variables
2 all others
| all others
all | all others
Yj
j
j
r
SSR X
SST SSR SSR X
jX
Bina Nusantara
Coefficient of Partial Determination for jX
(continued)
1 221 2
1 2 1 2
|
, |Y
SSR X Xr
SST SSR X X SSR X X
Example: Model with two independent variables
Bina Nusantara
Venn Diagrams and Coefficient of Partial Determination for jX
Oil
TempInsulation
1 2|SSR X X
21 2
1 2
1 2 1 2
|
, |
Yr
SSR X X
SST SSR X X SSR X X
=
Bina Nusantara
Contribution of a Subset of Independent Variables
• Let Xs Be the Subset of Independent Variables of Interest
–
– Measures the contribution of the subset Xs in explaining SST with the inclusion of the remaining independent variables
| all others except
all all others except
s s
s
SSR X X
SSR SSR X
Bina Nusantara
Contribution of a Subset of Independent Variables: Example
Let Xs be X1 and X3
1 3 2
1 2 3 2
and |
, and
SSR X X X
SSR X X X SSR X
From ANOVA section of regression for
From ANOVA section of regression for
0 1 1 2 2 3 3i i i iY b b X b X b X 0 2 2i iY b b X
Bina Nusantara
Testing Portions of Model
• Examines the Contribution of a Subset Xs of Explanatory Variables to the Relationship with Y
• Null Hypothesis:– Variables in the subset do not improve the model
significantly when all other variables are included • Alternative Hypothesis:
– At least one variable in the subset is significant when all other variables are included
Bina Nusantara
Testing Portions of Model
• One-Tailed Rejection Region• Requires Comparison of Two Regressions
– One regression includes everything– Another regression includes everything
except the portion to be tested
(continued)
Bina Nusantara
Partial F Test for the Contribution of a Subset of X Variables
• Hypotheses:
– H0 : Variables Xs do not significantly improve the model given all other variables included
– H1 : Variables Xs significantly improve the model given all others included
• Test Statistic:–
– with df = m and (n-k-1)
– m = # of variables in the subset Xs
| all others /
allsSSR X m
FMSE
Bina Nusantara
Partial F Test for the Contribution of a Single
• Hypotheses:– H0 : Variable Xj does not significantly improve the
model given all others included– H1 : Variable Xj significantly improves the model
given all others included• Test Statistic:
–
– with df = 1 and (n-k-1 ) – m = 1 here
jX
| all others
alljSSR X
FMSE
Bina Nusantara
Testing Portions of Model: Example
Test at the = .05 level to determine if the variable of average temperature significantly improves the model, given that insulation is included.
Bina Nusantara
Testing Portions of Model: ExampleH0: X1 (temperature) does not improve model with X2 (insulation) included
H1: X1 does improve model
= .05, df = 1 and 12
Critical Value = 4.75
ANOVASS
Regression 51076.47Residual 185058.8Total 236135.2
ANOVASS MS
Regression 228014.6263 114007.313Residual 8120.603016 676.716918Total 236135.2293
(For X1 and X2) (For X2)
Conclusion: Reject H0; X1 does improve model.
1 2
1 2
| 228,015 51,076261.47
, 676.717
SSR X XF
MSE X X