ordinary least squares estimation: a primer projectseminar migration and the labour market, meeting...

Ordinary Least Squares Estimation: A Primer

Projectseminar Migration and the Labour Market,

Meeting May 24, 2012

The linear regression model

1. A brief introduction into linear regression2. How to do a regression3. How to interpret the output in STATA

linear regression

1.A brief introduction into linear regression models

•In order to find out the relationship between one aspect (variable1) and another aspect (variable2), one may run a regression model.

• e.g. what is the impact of work experience on the level of wages

•A regression measures whether and by which extent an exogenous (independent) variable affects an

endogenous (dependent) variable.

linear regression

1.A brief introduction into linear regression

•A regression indicates how much and in which way a independent variable influences a dependent variable.

•One can distinguish between:• A positive and negative correlation• A high and low correlation• A significant or insignificant impact

•Definition of significance: You test at which significance level (e.g. 0, 5%, 10%-level) you can reject the hypothesis that the variable has zero impact (so-called “Null-Hypothesis”) or H0.

linear regression

1.A brief introduction into linear regression

•the general multivariate model (with many explantory variables):

γi indicates the dependent/endogenous variable

x1i,ki exogenous variable, explaining/independent variable

β0 constant, y- axis intercept (if x = 0)

β1,2,k regression coefficient, parameters of regression

εi residual, disturbance term (should be normally distributed, expected value of 0, constant variance)

linear regression

1.short introduction into linear regression

•In a simple linear regression model, there is beside the constant only one regression coefficient:

γi indicates the dependent/endogenous variable

x1i exogenous variable, explaining/independent variable

β0 constant, y- axis intercept (if x = 0)

β1 regression coefficient, parameter of regression

εi residual, disturbance term

linear regression


In a simple linear regression model, there is beside the constant only one regression coefficient:

•Thus, in this simple linear regression model γi is explained by the variable x1i.

•Moreover there is a constant variable β0 and x1i is weighted by β1. β1 can be interpreted e.g. as the effect of an e.g. increase of x1i by one unit on the output (or wage) γi .

linear regression


In a simple linear regression model, there is beside the constant only one regression coefficient:

•εi is the disturbance variable and indicates the difference between the result of our estimation done by the regression model and “reality”, the true observed value. •The regression is done with the of Ordinary Least Squares (OLS) estimator, which minimizes the squared value of the residual. Nevertheless there is still deviation between the true and the estimated values since we have a stochastic and not a (purely) deterministic relationship.

linear regression


β0

εi

β1i

γi

x1i

linear regression


Models with fixed effects/dummy variables.

•You consider beyond one constant more constants (intercept) terms for each group.

•Where α are the constants and dummy variables, and i = 1, 2, … N is the group (e.g. education or experience) index.

•Thus, you consider N-1 dummy variables which creates a different intercept (constant) for each group.

•The slope parameter remains however uniform for all groups.

iiNi xy 111210 ....

linear regression

2. How to do a regression with STATA

> regress depvar [indepvars] [if] [, options ] after the command you first set the dependent variable (=endogenous variable, the variable you want to explain), after that you put the independent variables (exogenous variables).

example:

linear regression

3. How to interpret the output of a regression

_cons 55.76675 1.38561 40.25 0.000 53.04995 58.48355 hhinc .0165935 .0006298 26.35 0.000 .0153588 .0178283 sqm Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 5425682.95 3125 1736.21854 Root MSE = 37.696 Adj R-squared = 0.1816 Residual 4439145.82 3124 1420.98138 R-squared = 0.1818 Model 986537.128 1 986537.128 Prob > F = 0.0000 F( 1, 3124) = 694.26 Source SS df MS Number of obs = 3126

. regress sqm hhinc

fitting of the model

analysis of the coefficients

analysis of the variance of the model

β0

β1

linear regression



• β0 indicates the output if there is no x1i , i.e. if there is no income the outcome variable (sqm) would be equal to the value of coefficient β0 .

• β1 describes how much the output changes if there would be an increase of hhinc by one. Can be positive or negative (-> positive or negative correlation)

β0

β1


linear regression



•Moreover the output gives us the standard error, the t-value and the p-value.•The standard error is a measure for the precision of the parameter estimate.•The t-value is the coefficient divided by the standard error. As a rule of the thumb, a t-value of 2.0 indicated that the coefficient is significantly different from zero at the 5% level, a t-value of 2.64 that it is differently from zero at the 1% level.

β0

β1


linear regression



•The p-value provides the accurate significance level for the rejection of the Null Hypothesis, i.e. that the estimated parameter is different from zero. A p-value < 0.05 indicates that the significance level is 5%.•Reporting: Your report the coefficient and either the standard error or the t-statistics, and indicate the significance levels by stars behind the coefficient. E.g. *** suggest a significance level of 1%, ** of 5%, * of 10%.

β0

β1


linear regression


fit of the model

A good value about the fit of our regression is R-squared, R2 indicates how good our model can explain the “real” values of y.R2 = 1 -> perfect fit, our model can explain every single valueR2 = 0 -> no fit, our model is rather useless


linear regression


fit of the model

• The adjusted R squared corrects the R squared for the number of variables considered. It’s a slightly better measure than the R squared. Report either the R squared or the adjusted R squared.• The other measures are usually not reported.


linear regression


Analysis of the variance of the model

This part of the output indicates the variation of the model and of the residual.


ordinary least squares estimation: a primer projectseminar migration and the labour market, meeting...

Documents