ordinary least squares estimation: a primer projectseminar migration and the labour market, meeting...

17
Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief introduction into linear regression 2. How to do a regression 3. How to interpret the output in STATA

Upload: augustus-long

Post on 03-Jan-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief

Ordinary Least Squares Estimation: A Primer

Projectseminar Migration and the Labour Market,

Meeting May 24, 2012

The linear regression model

1. A brief introduction into linear regression2. How to do a regression3. How to interpret the output in STATA

Page 2: Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief

linear regression

1.A brief introduction into linear regression models

•In order to find out the relationship between one aspect (variable1) and another aspect (variable2), one may run a regression model.

• e.g. what is the impact of work experience on the level of wages

•A regression measures whether and by which extent an exogenous (independent) variable affects an

endogenous (dependent) variable.

Page 3: Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief

linear regression

1.A brief introduction into linear regression

•A regression indicates how much and in which way a independent variable influences a dependent variable.

•One can distinguish between:• A positive and negative correlation• A high and low correlation• A significant or insignificant impact

•Definition of significance: You test at which significance level (e.g. 0, 5%, 10%-level) you can reject the hypothesis that the variable has zero impact (so-called “Null-Hypothesis”) or H0.

Page 4: Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief

linear regression

1.A brief introduction into linear regression

•the general multivariate model (with many explantory variables):

γi indicates the dependent/endogenous variable

x1i,ki exogenous variable, explaining/independent variable

β0 constant, y- axis intercept (if x = 0)

β1,2,k regression coefficient, parameters of regression

εi residual, disturbance term (should be normally distributed, expected value of 0, constant variance)

Page 5: Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief

linear regression

1.short introduction into linear regression

•In a simple linear regression model, there is beside the constant only one regression coefficient:

γi indicates the dependent/endogenous variable

x1i exogenous variable, explaining/independent variable

β0 constant, y- axis intercept (if x = 0)

β1 regression coefficient, parameter of regression

εi residual, disturbance term

Page 6: Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief

linear regression

1.short introduction into linear regression

In a simple linear regression model, there is beside the constant only one regression coefficient:

•Thus, in this simple linear regression model γi is explained by the variable x1i.

•Moreover there is a constant variable β0 and x1i is weighted by β1. β1 can be interpreted e.g. as the effect of an e.g. increase of x1i by one unit on the output (or wage) γi .

Page 7: Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief

linear regression

1.short introduction into linear regression

In a simple linear regression model, there is beside the constant only one regression coefficient:

•εi is the disturbance variable and indicates the difference between the result of our estimation done by the regression model and “reality”, the true observed value. •The regression is done with the of Ordinary Least Squares (OLS) estimator, which minimizes the squared value of the residual. Nevertheless there is still deviation between the true and the estimated values since we have a stochastic and not a (purely) deterministic relationship.

Page 8: Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief

linear regression

1.short introduction into linear regression

β0

εi

β1i

γi

x1i

Page 9: Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief

linear regression

1.short introduction into linear regression

Models with fixed effects/dummy variables.

•You consider beyond one constant more constants (intercept) terms for each group.

•Where α are the constants and dummy variables, and i = 1, 2, … N is the group (e.g. education or experience) index.

•Thus, you consider N-1 dummy variables which creates a different intercept (constant) for each group.

•The slope parameter remains however uniform for all groups.

iiNi xy 111210 ....

Page 10: Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief

linear regression

2. How to do a regression with STATA

> regress depvar [indepvars] [if] [, options ] after the command you first set the dependent variable (=endogenous variable, the variable you want to explain), after that you put the independent variables (exogenous variables).

example:

Page 11: Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief

linear regression

3. How to interpret the output of a regression

_cons 55.76675 1.38561 40.25 0.000 53.04995 58.48355 hhinc .0165935 .0006298 26.35 0.000 .0153588 .0178283 sqm Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 5425682.95 3125 1736.21854 Root MSE = 37.696 Adj R-squared = 0.1816 Residual 4439145.82 3124 1420.98138 R-squared = 0.1818 Model 986537.128 1 986537.128 Prob > F = 0.0000 F( 1, 3124) = 694.26 Source SS df MS Number of obs = 3126

. regress sqm hhinc

fitting of the model

analysis of the coefficients

analysis of the variance of the model

β0

β1

Page 12: Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief

linear regression

3. How to interpret the output of a regression

analysis of the coefficients

• β0 indicates the output if there is no x1i , i.e. if there is no income the outcome variable (sqm) would be equal to the value of coefficient β0 .

• β1 describes how much the output changes if there would be an increase of hhinc by one. Can be positive or negative (-> positive or negative correlation)

β0

β1

_cons 55.76675 1.38561 40.25 0.000 53.04995 58.48355 hhinc .0165935 .0006298 26.35 0.000 .0153588 .0178283 sqm Coef. Std. Err. t P>|t| [95% Conf. Interval]

Page 13: Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief

linear regression

3. How to interpret the output of a regression

analysis of the coefficients

•Moreover the output gives us the standard error, the t-value and the p-value.•The standard error is a measure for the precision of the parameter estimate.•The t-value is the coefficient divided by the standard error. As a rule of the thumb, a t-value of 2.0 indicated that the coefficient is significantly different from zero at the 5% level, a t-value of 2.64 that it is differently from zero at the 1% level.

β0

β1

_cons 55.76675 1.38561 40.25 0.000 53.04995 58.48355 hhinc .0165935 .0006298 26.35 0.000 .0153588 .0178283 sqm Coef. Std. Err. t P>|t| [95% Conf. Interval]

Page 14: Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief

linear regression

3. How to interpret the output of a regression

analysis of the coefficients

•The p-value provides the accurate significance level for the rejection of the Null Hypothesis, i.e. that the estimated parameter is different from zero. A p-value < 0.05 indicates that the significance level is 5%.•Reporting: Your report the coefficient and either the standard error or the t-statistics, and indicate the significance levels by stars behind the coefficient. E.g. *** suggest a significance level of 1%, ** of 5%, * of 10%.

β0

β1

_cons 55.76675 1.38561 40.25 0.000 53.04995 58.48355 hhinc .0165935 .0006298 26.35 0.000 .0153588 .0178283 sqm Coef. Std. Err. t P>|t| [95% Conf. Interval]

Page 15: Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief

linear regression

3. How to interpret the output of a regression

fit of the model

A good value about the fit of our regression is R-squared, R2 indicates how good our model can explain the “real” values of y.R2 = 1 -> perfect fit, our model can explain every single valueR2 = 0 -> no fit, our model is rather useless

Total 47239.8174 799 59.1236763 Root MSE = 7.6349 Adj R-squared = 0.0141 Residual 46516.8286 798 58.2917651 R-squared = 0.0153 Model 722.988807 1 722.988807 Prob > F = 0.0005 F( 1, 798) = 12.40 Source SS df MS Number of obs = 800

Page 16: Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief

linear regression

3. How to interpret the output of a regression

fit of the model

• The adjusted R squared corrects the R squared for the number of variables considered. It’s a slightly better measure than the R squared. Report either the R squared or the adjusted R squared.• The other measures are usually not reported.

Total 47239.8174 799 59.1236763 Root MSE = 7.6349 Adj R-squared = 0.0141 Residual 46516.8286 798 58.2917651 R-squared = 0.0153 Model 722.988807 1 722.988807 Prob > F = 0.0005 F( 1, 798) = 12.40 Source SS df MS Number of obs = 800

Page 17: Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief

linear regression

3. How to interpret the output of a regression

Analysis of the variance of the model

This part of the output indicates the variation of the model and of the residual.

Total 47239.8174 799 59.1236763 Root MSE = 7.6349 Adj R-squared = 0.0141 Residual 46516.8286 798 58.2917651 R-squared = 0.0153 Model 722.988807 1 722.988807 Prob > F = 0.0005 F( 1, 798) = 12.40 Source SS df MS Number of obs = 800