linear regression ( cont'd ). outline - multiple regression - checking the regression : coeff....
DESCRIPTION
Multiple Regression Does Consumption is only affected by income ? There are some other variables that could also have relation to the income Then the simple regression should be expanded by introducing some new I.Vs into the model --> multiple regressionTRANSCRIPT
Linear Regression ( Cont'd )
Outline- Multiple Regression
- Checking The Regression : Coeff. Determination Standard ErrorConfidence IntervalHypothesis Test :t test, F test,
- Classical Assumption Test : No MulticollinearityHomoscedasticityNo autocorrelation
Multiple Regression
Does Consumption is only affected by income ? There are some other variables that could also have
relation to the income Then the simple regression should be expanded by
introducing some new I.Vs into the model --> multiple regression
The MODEL
Yi = 0 + 1X1i + 2X2i + 3X3i + ........+ kXki + ui
i = 1,2,3,......., N (observation)
example:Yi = 0 + 1X1 + 2X2 + 3X3 + ui
Y : ConsumptionX1 : IncomeX2 : Number of DependanceX3 : Age
Checking The Regression
1. Coeffisient of Determination2.Standard Error of Coefficient3.Confidence Interval4.Hypotesis Test:
t-test F-test
2.Standard Error
Principle of OLS --> minimizing error. Therefore the accuracy of the estimators is determined
by each standard error (S.e). The formula of S.e.
Se= ∑ Υ− Υ 2
n−2= SST−SSR
n−2= ∑ Υ 2−b∑ ΧΥ
n−2=MSE
Checking The Regression
As , then
s
u N
i
2 1/2
2
ui2 =
2)ˆ( ii YY = ui2
The minimal standard error resulted from Smallest error
How small is S.e to be the best ? Difficult for absolut number
More usefull when it is combined with each coefficient of regression
Coefficent to S.e ratio The Ratio will be used for t-test.
3.Confidence Interval of j
What is Confidence Interval of Parameter ? What for? Formula:
bj t/2 s.e(bj)or
P(bj - t/2 s.e(bj) ≤ βj ≤ bj + t/2 s.e(bj))= 1-
exampleFrom the regression we get : b1 = 0,1022 and s.e (b1) = 0,0092. observation(n) = 10; Estimated parameter (k) = 2; Then, degree of freedom = 10 – 2 = 8 and signifance level = 5 %.
then from the t-table find ( t df ) or ( t 0.025, 8) = 2,306
therefore the confidence interval for β1 is( 0,1022 2,306 (0,0092) ) or (0,0810 ; 0,1234)
interpretation: the value of β1 will lie on the interval 0,0810 and 0,1234 with the confidence level 95%.
4.Hypotesis Test It is an individual testing for coefficient of regression.
H0 : j = 0H1 : j 0; j = 0, 1, 2........, k is slope of
coefficient.
For simple regression: (1) H0 : 0 = 0 (2) H0 : 1 = 0 H1 : 0 0 H1 : 1 0;
T-test is defined :t =b j−β js .e b j
Testing to find out if j is not different to 0
t =b j
s .e b j
t-test
The t-computation is compared to t- table.
If we get t > t/2,df, then the t value is in rejection area
Thus, the null hypothesis(j = 0) is rejected with confidence level (1-) x100%.
In other word j statistically significance.
Uji Hipotesis F-Test
to find out whether the model statistically significant ? or
hypothesis test for all the coefficeint together
H0 : 2 = 3 = 4 =............= k = 0H1 : at least one of k 0), where k is the
number of I.Vs.
F-test can be more explained by ANOVA
Observation: Yi = 0 + 1 Xi + ei Regression: Ŷi = b1 + b2 Xi
Reduced the two sides by
then square the two sides:
SST SSR SSE
Y Y Y Y ei i
( ) ( )Y Y Y Y ei i i 2 2
( ) ( )Y Y Y Y ei i i 2 2 2
Y
ANOVA Table
Source Sum of Square df Mean Squares F-statRegresi SSR k MSR = SSR/k F = MSRError SSE n-k-1 MSE= SSE/(n-k-1) MSETotal SST n-1
Compare F-stat with Fα(k,n-k-1) ( F-table )
Classical Assumption of OLS
The estimator of OLS shuould be BLUE (Best Linier Unbiased Estimate)
3 main requirements: No Multikoliniearities No Heteroskedasticity No Autotocorrelation
Multicolliniearities Multikolinieritas: is linear relation
between I.Vs
for two regressor, X1 dan X2. if : X1 = X2, there is collinearity.
But not be the case if, example X1 = X22 or X1 = log X2
example Yi = 0 + 1X1 + 2X2 + 3X3 + ui
Y : ConsumptionX1 : Total IncomeX2 : Wage incomeX3 : non-wage income
There is multico--> even can be perfect multico
Why ?
Data of Perfect Multicolliniearity
11811629969223827619656416514812X3X2X1
X2 = 4X1. --> perfect multicollinearity relation.
Impact of Multicoliniearities
High Varians (dari taksiran OLS) Widely Confidence Interval High R2 but could get much insignificant
coefficients from t-test. The direction of coefficent can be
misleaded.
22672201602129210140195419013514561401101234120100102310085113611090856806565965505005040
Asset(X2)Income (X1)Consumption (Y)
Model:Y = 12,8 – 1,414X1 + 0,202 X2 SE (4,696) (1,199) (0,117)t (2,726) (-1,179) (1,721)R2 = 0,982
R2 is very high 98,2%. What's mean? t-test is not significant. What's mean? Coefficient X1 is negative. What's mean?
Detecting Multicoliniearities
1. Comparing R2 and t-stat
2. Using Correlation Matrix for I.Vs
3. VIF (Variance Inflation Factor) and Tolerance Value (TOL) --> for SPSS
VIF j=1
1−R j2
; j = 1,2,……,k
VIF threshold is usually 2 --> Indication of collinearity when below 2
TOL j=1VIF
= 1−R j2
Solving Multicolliniearities
Relevant Informations ( theory or previous research) Combination of cross-section and time series Eliminating the infected variables
– Common to be used.– Be Careful --> specification bias.
Transforming the variabel : first difference method Adding additional sample/data
Heteroskedastisity
Variance of Error is not constant. Generally occurs in cross sectional data.
ex. Consumption and Income in Province level The disobedience of homoskedasticity still keep the
estimator is unbiased, but not efficient
0
20
40
60
80
100
120
0 20 40 60
The Pattern of Heteroskedasticity
Checking The Heteroskedasticity
1. Graffic Method Analyzing the pattern relationship
between (ui2) and predicted Yi.
ui2
i
,
ui2 ui
2
ii
Solving heteroskedastisity 1. Transformed in to the Logarithmic model
Ln Yj = β0 + β1 Ln Xj + uj
Autocorrelation Is correlation between varable it self, at
the different time and individual sample observation.
Generally occurs in the case of time series data
E (ui uj) becomes not equal 0 The estimator becomes inefficient
Autocorrelation :
ui ui * * ** * * * * * * * * * ** * * * Waktu/X * ** Waktu/X * * *
Detecting autocorrelation
Durbin-Watson Test
d=∑t= 2
N
ut−u t−1 2
∑t=1
N
u t 2
Test -stat
compare the d-stat to the d-tabel ( dL and dU)
Rules of Game
undecisive undecisive
positive No correlation negative
0 dL dU 4-dU 4-dL 4
THANK YOU