01 regression analysis

30
Basiskurs Finance 1. Regression Analysis

Upload: manuel-mercy-garcia

Post on 28-Sep-2015

229 views

Category:

Documents


3 download

DESCRIPTION

Financial Econometrics

TRANSCRIPT

  • Basiskurs Finance

    1. Regression Analysis

  • Please bring your laptop to the tutorials to follow along during the Excel exercises.

    People:

    Lecture: Dr. Nikolas Breitkopf ([email protected]) Tutorial: Janis Bauer ([email protected])

    Grading:

    60 minute exam Exam date: 02.06.2014, 18:3019:30 (please register for the exam via the LSF)

    1

    Course Overview

    Date Topic

    Tue. 08.04., 1216 Regression Analysis Lecture

    Tue. 15.04., 1216 Regression Analysis Tutorial

    Tue. 22.04. No class (Easter holiday)

    Tue. 29.04., 1216 Event Studies Lecture

    Tue. 06.05., 1216 Event Studies Tutorial

    Fri. 09.05., 1418 Monte-Carlo Simulation / Game Theory (Bayesian Equilibrium Models) Lecture

    Tue. 13.05., 1216 Monte-Carlo Simulation / Game Theory (Bayesian Equilibrium Models) Tutorial

  • Core questions What is an estimator?

    Properties of estimators

    Which problems result from violation of OLS assumptions?

    Agenda Motivation

    Ordinary Least Squares (OLS)

    Effects of violation of assumptions- Heteroscedasticity

    - Correlation of the regressors with the error term (Endogeneity)

    Fixed Effects Panel Estimation

    2

    REGRESSION ANALYSIS

    Content

  • Basic literature Barreto, H. and Howland, F. M.; Introductory Econometrics, Cambridge;

    latest edition

    Additional literature Johnston, J. and DiNardo, J.; Econometric Methods, McGraw-Hill; latest

    edition

    Greene, W.; Econometric Analysis, Prentice Hall; latest edition

    3

    REGRESSION ANALYSIS

    Literature

  • What are regressions used for in the field of finance? Estimation of beta of a stock

    Asset-Pricing-Tests

    Determinants of capital structure

    Event studies

    Determination of trends

    Forecasting

    Definition A regression estimates the linear relationship between independent

    variables (x) and the dependent variable (y).

    The real relationship of the population is deduced from a sample.

    4

    REGRESSION ANALYSIS

    Motivation

  • Dependent variable: Bid/Ask-Spread

    Stocks with Listed Options

    Stocks without Listed Options

    Constant 0.60 *** (196.48)

    4.23 ***(1,015.57)

    Naked Short Sale Ban (Dummy)

    0.33 ***(5.94)

    1.40 ***(12.24)

    Covered Short Sale Ban (Dummy)

    0.67 ***(9.66)

    2.14 ***(25.95)

    Disclosure Requirement (Dummy)

    -0.20 ***(-3.42)

    -0.72 ***(-6.54)

    Stock-level Fixed Effects Yes Yes

    #Obs 427,164 4,716,000

    #Stocks 1,306 15,185

    5

    REGRESSION ANALYSIS

    Example: Prohibition of naked sales of stocks

    Source: Beber/Pagano 2010, WP

  • The population is the true, data-generating process that determines the relationship between variables of interest. Usually, one cannot observe the full population

    Statistical inference is the process to learn from a random sample about the population.

    Population variables are assumed to be random variables, i.e. there is no deterministic relationship between variables.

    Then, any statistic calculated from the sample is a random variable as well.

    6

    REGRESSION ANALYSiS

    Inference from a Random Sample

  • Assume the population consists of a normally distributed random variable X ~ N( = 200, = 20)

    Experiment Draw 8 random samples from the population, each having 10 observations.

    Calculate the mean of each sample.

    Calculate the mean of the experiments means and its standard deviation.

    What can you learn about the true random variable?

    The sample mean is the best estimator of the population mean Since the population variable X is a random variable, so will be the sample mean.

    To learn about the population, you have to know the distribution of the estimator (here the mean) in repeated samples.

    The standard error describes the uncertainty (standard deviation) of the estimator.

    7

    REGRESSION ANALYSIS

    Illustrative Example: Inference on the Average of Random Variable

  • Observation E1 E2 E3 E4 E5 E6 E7 E81 176.27 185.11 223.63 207.93 181.65 254.48 197.73 233.672 185.75 182.41 222.41 214.81 202.57 189.30 195.00 164.473 200.09 215.10 212.60 187.94 236.40 217.17 234.46 212.614 201.17 203.69 193.54 182.40 188.33 202.42 199.96 218.675 232.02 197.46 214.99 208.36 196.82 216.74 199.91 166.116 202.67 225.78 219.02 165.95 189.38 168.10 210.96 221.777 210.55 236.46 246.32 191.26 219.76 202.98 198.10 227.808 203.00 202.90 218.72 218.34 228.23 203.15 210.51 194.889 166.32 179.36 219.29 197.47 204.82 194.76 184.78 173.09

    10 221.19 162.50 209.35 194.19 201.30 199.27 151.77 199.35Average 199.90 199.08 217.99 196.86 204.93 204.84 198.32 201.24Std. Deviation 19.73 22.62 13.20 16.06 17.94 22.34 21.03 25.89

    Mean (Averages) 202.89Standard Error (Experiments) 6.75

    Standard Error (theoretical) 6.32

    8

    REGRESSION ANALYSIS

    Illustrative Example (continued)

    n/:Mean theof Error Standard

  • Estimation An estimator is a method to determine unknown parameters of a

    population with the help of a random sample from this population.

    9

    REGRESSION ANALYSIS

    Estimator Properties (I)

    Real population meanPopulation

    Real population mean

    Estimated population mean

    Elements of sample (here: sample n=10)

  • Desirable Properties Unbiasedness

    - The expected value of estimator is the true parameter.

    Efficiency- The sample variance of unbiased estimator is the smallest of all unbiased

    estimators.

    - Example: OLS is the best linear unbiased estimator (BLUE).

    Consistency- A biased estimator is consistent, if it converges asymptotically against the true

    parameter.

    10

    REGRESSION ANALYSIS

    Estimator Properties (II)

    222

    222

    1

    222

    221

    1

    221

    )(lim1)()(1

    )()(1

    1

    =

    ==

    =

    =

    =

    =

    sEn

    nsExxn

    s

    sExxn

    s

    n

    n

    ii

    n

    ii

    =)(bE

  • OLS Estimation of the best straight line describing the relationship between

    x and y.

    Approach: Minimization of the squared errors by

    Properties- Minimization of Residual Sums of Squares

    - The straight line crosses the point

    11

    REGRESSION ANALYSIS

    OLS in the case of two-dimensionality

    ebxay ++=

    iiiii

    i

    bxayyyeeRSS

    ==

    =

    2

    ),( yx

    x

    y

    Slope b

    e1

    e2

    Intercept a

  • OLS in matrix notation: Population model:

    u describes the error term in the (unobserved) population, e the error term of the sample.

    The vector of the sample residuals is:

    The optimization problem is then as follows:

    - First order conditions (FOC)

    12

    REGRESSION ANALYSIS

    OLS in the case of multidimensionality

    =

    =

    =

    =+=

    nkknn

    k

    k

    n u

    uu

    xx

    xxxx

    Xy

    yyuXy ...;...;

    ...1

    .........

    ...1

    ...1

    ;...with1

    1

    0

    ,2,

    ,22,2

    ,12,11

    Xbye =

    )''''2'(min'min XbXbyXbyyeebb

    +=

    1

    1

    1 )'()'(')'(0'2'2

    xk

    xkkxk

    yXXXbyXbXXXbXyXb

    RSS ===+=

  • Transpose of a matrix:

    Product of two matrices:

    Identity matrix I:

    Properties of the inverse of a (square) matrix:

    13

    REGRESSION ANALYSIS

    Linear Algebra Basics

    =

    =

    dbca

    Adcba

    A '

    ++++

    =

    =

    =

    dhcfdgcebhafbgae

    ABhgfe

    Bdcba

    A

    AAIAIA

    ==

    =

    1...00............0...100...01

    I

    IAAIAA=

    =

    1

    1

  • 14

    REGRESSION ANALYSIS

    Matrix Multiplication: Typical Cases

    )( kn )( mk )( mn=

    =

    =

    Example InterpretationShape

    ee' Sum of squared elements of e(inner product)

    X Linear combination of the columns of X

    XX '

    =

    Co-Variation of the columns of X (second-moment matrix)

    = 'uu Product of all combinations of the elements of u (outer product)

  • OLS estimates a linear function, that crosses the mean of X and y. To integrate the intercept into the estimation equation, a vector

    consisting of ones has to be added to the matrix X.

    15

    REGRESSION ANALYSIS

    What is the intercept?

    [ ] [ ]

    yb

    yXXX

    yXXXbXyOLS

    ===

    =++=

    ===

    =

    =

    =

    =

    2631

    6321321

    111'31)3(

    111

    111)'(

    ')'(111

    321

    11

    1

  • X has full rank k Solution of OLS only possible, if the matrix (XX) is invertible, i.e. it has

    to be positive definite (all intrinsic values > 0).

    X has to consist of linear independent columns (full rank).

    A violation of this assumption is also called perfect collinearity and results usually from a wrong specification of the problem.

    - Wrong specification of dummy variables:

    - If a variable, consisting of c attributes, is separated into c Dummy Variables, then X no longer posses full column rank.

    - Example: gender is separated in female: 1 if female otherwise 0, and male: 1 if male otherwise 0.

    16

    REGRESSION ANALYSIS

    Central OLS assumptions (I)

    =

    011101011

    X

  • E(Xu)=0 Regressors are not correlated with the error

    terms.

    Homoscedasticity: Error terms are iid (0, )

    E(ui)=0 The expected value of the error term is

    zero.

    Var(ui) = 2 The variance of the error terms is constant.

    E(uiuj) = 0 for i j Individual errors are independent.

    No autocorrelation.

    17

    REGRESSION ANALYSIS

    Central OLS assumptions (II)

    [ ]][][][

    ])[])([(),(bEaEbaE

    bEbaEaEbaCov=

    =

    Note:

  • Unbiasedness: E(b) =

    The regressors are uncorrelated with the residual: Cov(e, X) = 0

    Note: E(Xe)=0 is not to be mixed up with the assumption E(Xu)=0. OLS is calculated such that E(Xe)=0, nevertheless E(ee) can be a biased

    and inconsistent estimate of E(uu).

    18

    REGRESSION ANALYSIS

    Properties of OLS

    ====

    +=

    +=

    +==

    )(0)(')'()(')'()(

    ')'()'()'(

    )(')'(

    )'()'(

    11

    11

    1

    1

    bEuEXXXbEuXXXb

    uXXXXXXXb

    uXXXXb

    uXyyXXXb

    0),(0)'(0'')'()'(

    )(')'()'()'( 1

    ===+=+=

    =

    XeCoveXEeXeXbXXbXX

    eXbXbXXyXXXb

  • Variance-Covariance-Matrix of the error terms

    Inference Estimation of 2: E(ee), so-called standard error of the regression: Variance of the OLS coefficients

    19

    REGRESSION ANALYSIS

    OLS in the multivariate case (II)

    [ ] I

    uEuuEuuE

    uEuuEuuEuuEuE

    uuu

    u

    uu

    EuuE

    nnn

    n

    n

    n

    2

    2

    2

    2

    !

    221

    2212

    1212

    1

    212

    1

    ...00............0...00...0

    )(...)()(............

    ...)()()(...)()(

    ......

    )'(

    =

    =

    =

    =

    1212

    11

    11

    )'()r(av)'()var(

    )'(')')('(

    ])'('')'[(

    )(because])')([()var(

    ==

    =

    =

    ==

    XXsbXXb

    XXXIXXXuuE

    XXXuuXXXE

    bEbbEb

    I

  • The standard errors of the coefficients are the square root of the diagonal of var(b). They can be used to calculate the t-statistic of an estimate: t = b/se(b)

    A joint hypothesis test can be conducted by:

    where R describes a (q x k)-matrix and r a vector of length q.

    The test statistic is:

    Example: H0: i=0, i=1..4

    20

    REGRESSION ANALYSIS

    Hypothesis testing

    ]')'(,0[~)(:: 120 RXXRNrRbwithrRH=

    ),(~)/('

    /)(]')'([)'( 11 knqFknee

    qrRbRXXRrRb

    4

    0000

    1000010000100001

    =

    =

    = qrR

  • True model: 100 simulations of random samples of x,y

    - x ~ N(0,1) one-time sampled (fixed regressors).

    - error terms u are sampled randomly out of N(0,1)

    - For each sample, conduct OLS estimation

    21

    REGRESSION ANALYSIS

    Example: Estimates as random variables

    1015.0)(9993.0)(1083.0)(9921.0)(

    ====

    bStdDevbE

    aStdDevaE

    1,1 ==++= uxy

  • Properties Variance-Covariance-Matrix of the coefficients:

    is the covariance matrix of the error terms

    OLS makes the assumption = 2I

    If the OLS-assumptions are violated, the standard significance statements are wrong.

    22

    REGRESSION ANALYSIS

    Properties of the Variance-Covariance-Matrix under OLS assumptions

    11

    11

    )'(')'(

    ])'('')'[(

    ])')([()var(

    =

    =

    =

    XXXXXX

    XXXuuXXXE

    bbEb

    IAAXXXXXIXXX

    XXXuuXXXEb

    ==

    =

    =

    112

    112

    11

    as)'()'(')'(

    ])'('')'[()var(

    ==

    2

    2

    2

    2

    ...00............0...00...0

    I

  • OLS assumption: The error term ui posses a constant variance for all observations i(Homoscedasticity).

    Heteroscedasticity:

    23

    REGRESSION ANALYSIS

    Heteroscedasticity

    Example: 2 =f(x)=x2

    =

    2

    22

    21

    ...00............0...00...0

    )'(

    n

    uuE

  • Monte-Carlo-Simulation Assume the true model is:

    Create 1000 samples of sample, each with a sample size of N = 100 observations

    A regression from y to x is executed for each one of the 1000 data sets and the resulting axis intercept and the slope is noted.

    Standard errors of OLS are biased here: The estimated standard error of the intercept is too large; the estimated standard error of the slope is too small.

    Note: The coefficient estimates of OLS are still unbiased even in presence of heteroscedasticity. However, for inference unbiased standard errors are essential.

    24

    REGRESSION ANALYSIS

    Results of a simulation study with heteroscedastic error terms

    N=100

    =1 =1

    Coefficients (OLS) 1.0091 0.99751

    OLS s.e. (avg. / incorrect)

    0.60031 0.19875

    OLS s.e. (sim. distribution / correct)

    0.38269 0.21949

    White s.e. 0.37341 0.21262

    ),0(~with1 2iiiii xNuuxy ++=

  • The White Correction determines an adapted covariance matrix out of the sample standard error terms to correct for heteroscedasticity.

    The covariance matrix with heteroscedasticity is:

    has n parameters, this cant be estimated out of n observations.

    The coefficient estimators of OLS are unbiased, so the residuals e are unbiased estimates of u.

    The White matrix is asymptotically unbiased with any type of heteroscedasticity, and only k parameters have to be estimated.

    25

    REGRESSION ANALYSIS

    White Correction of the Variance-Covarince-Matrix

    ( ) ( ) ( ) 1

    :

    1 ''')var(

    0

    = XXXXXXb

    kxkS

    kxnnxnnxk

    =

    2

    22

    21

    ...00............0...00...0

    with

    n

    =

    =n

    iiii xxeS

    1

    20 ':White

  • OLS assumes that the regressors X and the error term u are uncorrelated E(Xu) = 0

    A violation of this assumption results particularly if the X-variables are measured with error

    - a X-variable is endogenous with y- a X-variable, that is relevant in the population, is omitted in the regression.

    Simulation experiment:

    OLS is extremely biased Solution: Instrumental Variables Regression

    26

    REGRESSION ANALYSIS

    Correlation of the regressors with the error term (Endogeneity)

    )1,0(~, with00 Nuuxuxy +=++= =1 =0.5 =0.1

    S=1000 OLS (=0) OLS OLS

    Parameter 0.5075 0.40766 0.10106

    Avg. s.e. 0.050628 0.081588 0.10124

    Std. Dev. 0.035988 0.067452 0.10102

    1%tile 0.42086 0.24095 -0.14092

    99%tile 0.59003 0.55394 0.32374

  • 27

    REGRESSION ANALYSIS

    Simulation results with E(Xu) 0

    Case: = 0.5

  • REGRESSION ANALYSIS

    Panel data consists of cross-section and time series data: N individuals, repeatedly observed at T points in time.

    Simple OLS would pool all N*T observations, assuming independence. Obviously, with economic individuals

    (like firms, stocks, countries, etc.) in the cross-section,

    - repeated observations of the same individual will be more similar than obs. between individuals.

    - OLS will then be inconsistent and biased.

    Solutions Estimate a system of equations, one for

    each individual. Estimate a system of equations, with

    restrictions requiring some homogeneity(e.g. same slope, different intercepts)

    28

    Panel Estimation

    I1

    I2

    I3

    y

    x

    eXby +=:OLSPooled

  • REGRESSION ANALYSIS

    Assumption: Individual differences are captured in differences in the constant term (intercept).

    This amounts to including one dummy variable per individual

    y1: Vector of observations of the dependent variable of individual 1

    X1 : Explanatory variables of individual 1 i is a vector of ones with length

    corresponding to y1 This is just a classical regression! Basically, the individual time series data

    is demeaned and then estimated by OLS.

    Properties:

    Computational intensive for large N. Significance of fixed effects

    F-test for the joint significance of the dummies.

    The model is robust against misspecification. Every time-invariant explanatory variable

    is captured by dummies.- E.g. legal form of firms, industry

    affiliation, etc. The individual effects can be correlated

    with the disturbances. Specific time-invariant variables can

    only be estimated / included as interaction terms with other regressors.

    The fixed effects themselves are biased estimates.

    29

    Fixed Effects Model (FE)

    [ ]

    +

    =

    +

    +

    =

    n

    nnn

    ddXy

    X

    XX

    y

    yy

    ...

    ......00

    ............0...00...0

    ......

    1

    2

    1

    2

    1

    2

    1

    i

    ii

    Basiskurs FinanceSlide Number 2REGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSiSREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSIS