2 the linear regression model

33
2. The Linear Regression Model Joshua Sherman Applied Econometrics 040693 University of Vienna

Upload: bhodza

Post on 23-Nov-2015

34 views

Category:

Documents


0 download

DESCRIPTION

Linear Regression Model

TRANSCRIPT

  • 2. The Linear Regression Model

    Joshua Sherman Applied Econometrics

    040693 University of Vienna

  • Regression model

    We use regression models to answer the following types of questions:

    If one variable changes in a certain way (PUNISHMENT), by how much will another variable (CRIME) change?

    Given the value of one variable, can we predict the corresponding value of another?

  • Simple linear regression model

    We begin by discussing the simple linear regression model, in which there is only one explanatory variable on the right hand side of the regression equation:

    = 0 + 1

    The unknown parameters 0 and 1 are the intercept and slope of the regression function. We refer to them as population parameters.

    Suppose Y is sales of umbrellas and X is the amount of rainfall in centimeters in a given city. Then the slope coefficient 1 represents the change in the average number of umbrella sales given a 1 cm change in rainfall. The intercept coefficient represents the average number of umbrella sales in a city with no rainfall.

  • Simple linear regression model

    The number of umbrella sales for all cities in which annual rainfall is a given amount (for example, 10 cm) will be scattered around the mean.

    A probability density function (pdf) will depict how these values are scattered around the mean.

    The mean is just one descriptor of a distribution. Another important descriptor is the variance.

    The variance is defined as the average of the squared differences between the

    values of a distribution and the mean. It is essentially a measure of the extent to which values of a distribution are spread out. Mathematically:

    = ( )2

  • Probability density function for Y

    Y

    X=10 X=25

    = 0 + 1

    This regression function shows the average number of umbrellas sold at different levels of rainfall, in centimeters

    The conditional variance of Y is = 2 for all values of X

  • Probability density function for Y

    On the previous slide, the constant variance assumption implies that at each level of rainfall X we are equally uncertain about how far values of Y will be from their average value, = 0 + 1. Data satisfying this condition are considered to be homoskedastic. If this assumption is not satisfied, the data are considered to be heteroskedastic.

  • The error term

    An observation on Y can be decomposed into two parts:

    Systematic component:

    = 0 + 1

    Random component:

    = = 0 1

    Rearranging we obtain the simple linear regression model:

    = 0 + 1 +

  • The error term

    Why do we introduce an error term?

    Unavailability of data Randomness in human behavior Net influence of a large number of small and independent causes

    As e is the random component, we know that = 0 because:

    = 0 1 = 0 We also know that the variances of Y and e are identical and equal

    to 2 because they only differ by a constant. Thus the pdfs for Y and e are identical in all respects except for their location.

  • The error term initial assumptions

    Several assumptions are required in order to run the simple linear regression model. Thus far we have assumed that: 1. = 0 + 1 + 2. = 0, which is equivalent to stating that = 0 + 1.

    3. = 2 = (homoskedasticity)

    Later we will see why these assumptions (and others) are important for our purposes

  • The population vs. the sample

    In practice, the econometrician will possess a sample of Y values corresponding to some fixed X values rather than data from the entire population of values. Therefore the econometrician will never truly know the values of 0 and 1.

    However, we may estimate these parameters. We will denote these estimators as 0 and 1.

  • Ordinary least squares

    So how shall we find 0 and 1? We need a method or rule for how to estimate the population parameters using sample data.

    The most widely used rule is the method of least squares, or ordinary least squares (OLS). According to this principle, a line is fitted to the data that renders the sum of the squares of the vertical distances from each data point to the line as small as possible.

  • Ordinary least squares

    Therefore the fitted line may be written as:

    = 0 + 1

    The vertical distances from the fitted line to each point are the least squares residuals, . They are given by:

    = = 0 1

  • Ordinary least squares

    Mathematically, we want to find 0 and 1 such that the sum of the squared vertical distances from the data points to the line is minimized:

    min 2

    = ( )2 = (0 1)

    2

    If you do not recall how to find the solution for 0 and 1 using partial derivatives, the steps may be found in the course text.

  • The least squares estimators

    Upon solving this minimization problem we find that:

    1 = ( )( )

    2

    0 = 1

    where =

    and =

    are the sample means

    of the observations on Y and X.

  • OLS and the true parameter values

    So how are the OLS estimators 0 and 1 related to 0 and 1?

    If assumptions 1 and 2 from earlier hold, then 0 = 0 and 1 = 1 (proof provided in the text).

    That is, if we were able to take repeated samples, the expected value of the estimators 0 and 1 would equal the true parameter values 0 and 1

    When the expected value of any estimator of a parameter equals

    the true parameter value, then that estimator is unbiased

    Later we will explore how violation of certain assumptions will cause estimators to be biased

  • OLS and the true parameter values

    So the idea behind OLS is that if we are dealing with an instance in which certain assumptions hold, the expected value of the estimators 0 and 1 will equal the true parameter values 0 and 1.

  • Coefficient of determination

    We are interested in a measure that will indicate how good of a fit our sample regression line is to the data. Let us define = , the deviation of a variable from its mean. Using sample data we note that = . Then:

    = +

    In other words, the amount by which the data deviate from the mean can be broken into an explained portion ( ) and an unexplained portion, .

  • Coefficient of determination

    Using = + , we may square both sides and divide by N to obtain:

    2

    = ( )2

    + 2

    We may then define the coefficient of determination 2 as the ratio of explained variation to total variation:

    2 =

    ( )2

    2

    = 1 2

    2

  • Coefficient of determination

    Therefore we have:

    ( )2: Total sum of squares (TSS). A measure of total variation in Y about the mean.

    ( )2: Explained sum of squares (ESS). The part of

    total variation in Y about the mean that is explained by the sample regression.

    2: Residual sum of squares (RSS). The part of total

    variation in Y about the mean that is not explained by the sample regression.

  • Coefficient of determination

    It can also be shown that:

    2 =

    ( )2

    2

    =( )

    2

    2 2 =

    ( )2

    ( 2 2)( 2 2)

    Its limits are 0 2 1. If = for each i, then 2 = 1

    How would the regression line appear graphically if 2=0? What is the intuition?

  • Coefficient of determination

    One should remain level-headed upon finding the 2:

    It would not be surprising to find an 2 near 1 when working with particular types of time series data that trend smoothly over time

    It would not be surprising to find a relatively low 2 when working with microeconomic data

    involving consumer behavior. Variations in individual behavior may be difficult to fully explain.

    There are several other measures that are important indicators of how to evaluate

    a model:

    Signs and magnitudes of the estimates Precision of the estimates The models predictive value

  • What makes a good estimator?

    Unbiasedness

    Earlier we stated that an estimator is unbiased if its mean is equal to the true value of the parameter being estimated

    Efficiency

    The smaller the variance, the better the chance that the estimate is close to the actual value of , which is unknown

  • What makes a good estimator?

    Restricting an estimator to be a linear function of the observations on the dependent variable makes our choice of which unbiased estimator has smallest variance manageable.

    An estimator that is linear, unbiased, and that has minimum variance among all linear unbiased estimators is called the best linear unbiased estimator (BLUE).

  • Assumptions when running OLS

    We require several assumptions in order for the OLS estimators to be BLUE: = 0 + 1 +

    = 0. It is important that the factors not explicitly included

    in the model, and therefore incorporated into , do not systematically affect the average value of Y. That is, the positive values cancel out the negative values so that their average effect on Y is zero.

    = 2 = . This is the assumption of homoskedasticity. Otherwise, our estimators will not have minimum variance.

  • Assumptions when running OLS

    , = , = 0. That is, the covariance between any pair of random errors is zero. Otherwise, our estimators will not have minimum variance.

    The variable X is not random and must take at least two different values. Without this condition, we cannot run OLS. Quite simply, if there is no variation in the X variable, then we will not be able to explain variation in the Y variable.

    The values of are normally distributed about their mean and therefore Y is normally distributed (this is necessary for hypothesis testing, which we will discuss in a later lecture):

    ~ 0, 2

  • Variance

    While the econometrician can never be certain that the estimates obtained are equal or close to the true parameters of the model (as the true parameters are unknowable), finding a coefficient with relatively small variance will certainly give him or her more confidence that the estimate is good

    That is, given two different distributions of 1 with the same mean, we prefer the distribution with smaller variance

    Variance size will be shown to be crucial when testing hypotheses

  • Variance

    Given our previous definition of variance, if our assumptions (1-5) hold it can be shown that the variances of 0and 1 are:

    0 = 2 2

    2

    1 =2

    2

    How does the extent to which is spread out relate to variance?

  • Variance

    In addition, we may be interested in the variance of the random error term

    The variance of the random error is:

    = 2 =

    2 = 2

    Of course, the random errors are unobservable. So how shall we proceed?

  • Variance

    Recall that:

    = = 0 1

    We may therefore replace with :

    2 = 2

    However, we must modify this formula slightly based on the number of

    regression parameters (K) (what is the intuition?). When dealing with only 0 and 1, K=2. Therefore the formula that we use to ensure an unbiased estimator is:

    2 = 2

  • Variance

    Now that we have found 2, an unbiased estimator of 2, we may write:

    0 = 2 2

    2

    1 = 2

    2

    The square roots of the estimated variances are the

    standard errors of 0 and 1

  • Covariance

    Earlier in the lecture we defined

    = ( )( )

    = () ()

    By extension we may define the covariance between two random variables X and Y as:

    , = = () ()

    =

    Positive covariance: When X is above (below) its mean, Y is likely to be above (below) its mean, and vice versa.

    Negative covariance: When X is above (below) its mean, Y is likely to be below (above) its mean, and vice versa

  • Coefficient of correlation

    However, interpreting is difficult because may arbitrarily increase or decrease depending on units of measurement. We may therefore scale the covariance by the standard deviations of the variables and define the coefficient of correlation as:

    =

    () ()=

    Its limits are 1 1, where = 1 indicates a

    perfect linear relationship between X and Y.

  • Covariance

    Covariance between 0 and 1 is also a measure of the association between the two variables:

    0, 1 = 0 (0) 1 (1)

    It can then be shown that:

    0, 1 = 2

    2

    Now that we have explored the theoretical background required to appreciate OLS, lets start working with an actual data set