regresi dan analisis varians pertemuan 21 matakuliah: i0174 – analisis regresi tahun: ganjil...

39
Regresi dan Analisis Varians Pertemuan 21 Matakuliah : I0174 – Analisis Regresi Tahun : Ganjil 2007/2008

Post on 19-Dec-2015

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Regresi dan Analisis VariansPertemuan 21

Matakuliah : I0174 – Analisis RegresiTahun : Ganjil 2007/2008

Page 2: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Regresi dan Analisis Varians

• Model Analisis Varians Eka Arah

• Pendekatan Regresi terhadap Klasifikasi satu arah

Page 3: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Population Y-intercept

Population slopes Random error

The Multiple Regression Model

Relationship between 1 dependent & 2 or more independent variables is a linear

function

Dependent (Response) variable

Independent (Explanatory) variables

1 2i i i k ki iY X X X

Page 4: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Multiple Regression Model

X2

Y

X1Y|X = 0 + 1X 1i + 2X 2i

0

Y i = 0 + 1X 1i + 2X 2i + i

ResponsePlane

(X 1i,X 2i)

(O bserved Y )

i

X2

Y

X1Y|X = 0 + 1X 1i + 2X 2i

0

Y i = 0 + 1X 1i + 2X 2i + i

ResponsePlane

(X 1i,X 2i)

(O bserved Y )

i

Bivariate model

1X

Y

2X

0 1 1 2 2i i i iY X X (Observed )Y

| 0 1 1 2 2Y X i iX X

Response

Plane0

1 2,i iX X

Page 5: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Multiple Regression Equation

X2

Y

X1

b0

Y i = b0 + b1X 1 i + b2X 2 i + e i

ResponsePlane

(X 1i, X 2i)

(O bserved Y)

^

e i

Y i = b0 + b1X 1 i + b2X 2 i

X2

Y

X1

b0

Y i = b0 + b1X 1 i + b2X 2 i + e i

ResponsePlane

(X 1i, X 2i)

(O bserved Y)

^

e i

Y i = b0 + b1X 1 i + b2X 2 i

Bivariate model 0 1 1 2 2i i i iY b b X b X e Y

1X

2X

(Observed )Y

Response

Plane

1 2,i iX X

0b

0 1 1 2 2i i iY b b X b X Multiple Regression EquationMultiple Regression Equation

Page 6: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Multiple Regression Equation

Too complicated

by hand! Ouch!

Page 7: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Interpretation of Estimated Coefficients

• Slope (bj )

– Estimated that the average value of Y changes by bj for each 1 unit increase in Xj , holding all other variables constant (ceterus paribus)

– Example: If b1 = -2, then fuel oil usage (Y) is expected to decrease by an estimated 2 gallons for each 1 degree increase in temperature (X1), given the inches of insulation (X2)

• Y-Intercept (b0)

– The estimated average value of Y when all Xj = 0

Page 8: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Multiple Regression Model: ExampleOil (Gal) Temp Insulation

275.30 40 3363.80 27 3164.30 40 1040.80 73 694.30 64 6

230.90 34 6366.70 9 6300.60 8 10237.80 23 10121.40 63 331.40 65 10

203.50 41 6441.10 21 3323.00 38 352.50 58 10

(0F)

Develop a model for estimating heating oil used for a single family home in the month of January, based on average temperature and amount of insulation in inches.

Page 9: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

1 2ˆ 562.151 5.437 20.012i i iY X X

Multiple Regression Equation: Example

CoefficientsIntercept 562.1510092X Variable 1 -5.436580588X Variable 2 -20.01232067

Excel Output

For each degree increase in temperature, the estimated average amount of heating oil used is decreased by 5.437 gallons, holding insulation constant.

For each increase in one inch of insulation, the estimated average use of heating oil is decreased by 20.012 gallons, holding temperature constant.

0 1 1 2 2i i i k kiY b b X b X b X

Page 10: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Multiple Regression in PHStat

• PHStat | Regression | Multiple Regression …• Excel spreadsheet for the heating oil example

Microsoft Excel Worksheet

Page 11: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Venn Diagrams and Explanatory Power of Regression

Oil

Temp

Variations in Oil explained by Temp or variations in Temp used in explaining variation in Oil

Variations in Oil explained by the error term

Variations in Temp not used in explaining variation in Oil

SSE

SSR

Page 12: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Venn Diagrams and Explanatory Power of Regression

Oil

Temp

2

r

SSR

SSR SSE

(continued)

Page 13: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Venn Diagrams and Explanatory Power of Regression

Oil

TempInsulation

Overlapping Overlapping variation in both Temp and Insulation are used in explaining the variationvariation in Oil but NOTNOT in the estimationestimation of nor

12

Variation NOTNOT explained by Temp nor Insulation SSE

Page 14: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Coefficient of Multiple Determination

• Proportion of Total Variation in Y Explained by All X Variables Taken Together

• Never Decreases When a New X Variable is Added to Model– Disadvantage when comparing among models

212

Explained Variation

Total VariationY k

SSRr

SST

Page 15: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Venn Diagrams and Explanatory Power of Regression

Oil

TempInsulation

212

Yr

SSR

SSR SSE

Page 16: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Adjusted Coefficient of Multiple Determination

• Proportion of Variation in Y Explained by All the X Variables Adjusted for the Sample Size and the Number of X Variables Used–

– Penalizes excessive use of independent variables– Smaller than– Useful in comparing among models– Can decrease if an insignificant new X variable is added to

the model

2 212

11 1

1adj Y k

nr r

n k

212Y kr

Page 17: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Example: Adjusted r2

Can Decrease

Regression StatisticsMultiple R 0.982654757R Square 0.965610371Adjusted R Square 0.959878766Standard Error 26.01378323Observations 15

0 1 2Oil Temp Insulation

0 1 2 3Oil Temp Insulation Color

Regression StatisticsMultiple R 0.983482856R Square 0.967238528Adjusted R Square 0.958303581Standard Error 25.72417272Observations 15

Adjusted r 2 decreases when k increases from 2 to 3

Color is not useful in explaining the variation in oil consumption.

Page 18: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Using the Regression Equation to Make Predictions

Predict the amount of heating oil used for a home if the average temperature is 300 and the insulation is 6 inches.

The predicted heating oil used is 278.97 gallons.

1 2

ˆ 562.151 5.437 20.012

562.151 5.437 30 20.012 6

278.969

i i iY X X

Page 19: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Testing for Overall Significance

• Test Statistic:

• Where F has k numerator and (n-k-1) denominator degrees of freedom

(continued)

all /

all

SSR kMSRF

MSE MSE

Page 20: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

ANOVAdf SS MS F Significance F

Regression 2 228014.6 114007.3 168.4712 1.65411E-09Residual 12 8120.603 676.7169Total 14 236135.2

Test for Overall SignificanceExcel Output: Example

k = 2, the number of explanatory variables n - 1

p-value

Test StatisticMSR

FMSE

Page 21: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Test for Overall Significance:Example Solution

F0 3.89

H0: 1 = 2 = … = k = 0

H1: At least one j 0 = .05df = 2 and 12

Critical Value:

Test Statistic:

Decision:

Conclusion:

Reject at = 0.05.

There is evidence that at least one independent variable affects Y.

= 0.05

F 168.47(Excel Output)

Page 22: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Test for Significance:Individual Variables

• Show If Y Depends Linearly on a Single Xj Individually While Holding the Effects of Other X’s Fixed

• Use t Test Statistic• Hypotheses:

– H0: j 0 (No linear relationship)

– H1: j 0 (Linear relationship between Xj and Y)

Page 23: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Coefficients Standard Error t Stat P-valueIntercept 562.1510092 21.09310433 26.65094 4.77868E-12Temp -5.436580588 0.336216167 -16.1699 1.64178E-09Insulation -20.01232067 2.342505227 -8.543127 1.90731E-06

t Test StatisticExcel Output: Example

t Test Statistic for X1 (Temperature)

t Test Statistic for X2 (Insulation)

i

i

b

bt

S

Page 24: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

t Test : Example Solution

H0: 1 = 0

H1: 1 0

df = 12

Critical Values:

Test Statistic:

Decision:

Conclusion:

Reject H0 at = 0.05.

There is evidence of a significant effect of temperature on oil consumption holding constant the effect of insulation.

t0 2.1788-2.1788

.025

Reject H0 Reject H0

.025

Does temperature have a significant effect on monthly consumption of heating oil? Test at = 0.05.

t Test Statistic = -16.1699

Page 25: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Venn Diagrams and Estimation of Regression Model

Oil

TempInsulation

Only this information is used in the estimation of 2

Only this information is used in the estimation of

1This information is NOT used in the estimation of nor1 2

Page 26: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Confidence Interval Estimate for the Slope

Provide the 95% confidence interval for the population slope 1 (the effect of temperature on oil consumption).

11 1n p bb t S

Coefficients Lower 95% Upper 95%Intercept 562.151009 516.1930837 608.108935Temp -5.4365806 -6.169132673 -4.7040285Insulation -20.012321 -25.11620102 -14.90844

-6.169 1 -4.704

We are 95% confident that the estimated average consumption of oil is reduced by between 4.7 gallons to 6.17 gallons per each increase of 10 F holding insulation constant.

We can also perform the test for the significance of individual variables, H0: 1 = 0 vs. H1: 1 0, using this confidence interval.

Page 27: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Contribution of a SingleIndependent Variable

• Let Xj Be the Independent Variable of Interest

– Measures the additional contribution of Xj in explaining the total variation in Y with the inclusion of all the remaining independent variables

jX

| all others except

all all others except

j j

j

SSR X X

SSR SSR X

Page 28: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Contribution of a Single Independent Variable kX

1 2 3

1 2 3 2 3

| and

, and and

SSR X X X

SSR X X X SSR X X

Measures the additional contribution of X1 in explaining Y with the inclusion of X2 and X3.

From ANOVA section of regression for

From ANOVA section of regression for

0 1 1 2 2 3 3i i i iY b b X b X b X 0 2 2 3 3i i iY b b X b X

Page 29: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Coefficient of Partial Determination of

• Measures the proportion of variation in the dependent variable that is explained by Xj while controlling for (holding constant) the other independent variables

2 all others

| all others

all | all others

Yj

j

j

r

SSR X

SST SSR SSR X

jX

Page 30: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Coefficient of Partial Determination for jX

(continued)

1 221 2

1 2 1 2

|

, |Y

SSR X Xr

SST SSR X X SSR X X

Example: Model with two independent variables

Page 31: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Venn Diagrams and Coefficient of Partial Determination for jX

Oil

TempInsulation

1 2|SSR X X

21 2

1 2

1 2 1 2

|

, |

Yr

SSR X X

SST SSR X X SSR X X

=

Page 32: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Contribution of a Subset of Independent Variables

• Let Xs Be the Subset of Independent Variables of Interest

– Measures the contribution of the subset Xs in explaining SST with the inclusion of the remaining independent variables

| all others except

all all others except

s s

s

SSR X X

SSR SSR X

Page 33: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Contribution of a Subset of Independent Variables: Example

Let Xs be X1 and X3

1 3 2

1 2 3 2

and |

, and

SSR X X X

SSR X X X SSR X

From ANOVA section of regression for

From ANOVA section of regression for

0 1 1 2 2 3 3i i i iY b b X b X b X 0 2 2i iY b b X

Page 34: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Testing Portions of Model

• Examines the Contribution of a Subset Xs of Explanatory Variables to the Relationship with Y

• Null Hypothesis:– Variables in the subset do not improve the model

significantly when all other variables are included • Alternative Hypothesis:

– At least one variable in the subset is significant when all other variables are included

Page 35: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Testing Portions of Model

• One-Tailed Rejection Region• Requires Comparison of Two Regressions

– One regression includes everything– Another regression includes everything

except the portion to be tested

(continued)

Page 36: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Partial F Test for the Contribution of a Subset of X Variables

• Hypotheses:

– H0 : Variables Xs do not significantly improve the model given all other variables included

– H1 : Variables Xs significantly improve the model given all others included

• Test Statistic:–

– with df = m and (n-k-1)

– m = # of variables in the subset Xs

| all others /

allsSSR X m

FMSE

Page 37: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Partial F Test for the Contribution of a Single

• Hypotheses:– H0 : Variable Xj does not significantly improve the

model given all others included– H1 : Variable Xj significantly improves the model

given all others included• Test Statistic:

– with df = 1 and (n-k-1 ) – m = 1 here

jX

| all others

alljSSR X

FMSE

Page 38: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Testing Portions of Model: Example

Test at the = .05 level to determine if the variable of average temperature significantly improves the model, given that insulation is included.

Page 39: Regresi dan Analisis Varians Pertemuan 21 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008

Bina Nusantara

Testing Portions of Model: ExampleH0: X1 (temperature) does not improve model with X2 (insulation) included

H1: X1 does improve model

= .05, df = 1 and 12

Critical Value = 4.75

ANOVASS

Regression 51076.47Residual 185058.8Total 236135.2

ANOVASS MS

Regression 228014.6263 114007.313Residual 8120.603016 676.716918Total 236135.2293

(For X1 and X2) (For X2)

Conclusion: Reject H0; X1 does improve model.

1 2

1 2

| 228,015 51,076261.47

, 676.717

SSR X XF

MSE X X