matrix diff

Upload: muhsafiq

Post on 03-Jun-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Matrix Diff

    1/5

    INTRODUCTION TO VECTOR

    AND MATRIX DIFFERENTIATION

    Econometrics 2

    Heino Bohn NielsenSeptember 21, 2005

    This note expands on appendix A.7 in Verbeek (2004) on matrix differenti-

    ation. We first present the conventions for derivatives of scalar and vector

    functions; then we present the derivatives of a number of special functions

    particularly useful in econometrics, and, finally, we apply the ideas to derive the

    ordinary least squares (OLS) estimator in the linear regression model. We should

    emphasize that this note is cursory reading; the rules for specific functions needed inthis course are indicated with a ().

    1 Conventions for Scalar Functions

    Let= (1,...,k)0 be ak 1vector and letf() =f(1,...,k)be a real-valued function

    that depends on , i.e. f() : Rk 7 R maps the vector into a single number, f().

    Then the derivative off() with respect to is defined as

    f()

    =

    f()1

    ...f()

    k

    . (1)

    This is a k 1 column vector with typical elements given by the partial derivative f()i.

    Sometimes this vector is referred to as the gradient. It is useful to remember that the

    derivative of a scalar function with respect to a column vector gives a column vector as

    the result1.1 We can note that Wooldridge (2003, p.783) does not follow this convention, and let f()

    be a 1 k

    row vector.

    1

  • 8/12/2019 Matrix Diff

    2/5

    Similarly, the derivative of a scalar function with respect to a row vector yields the

    1 k row vectorf()

    0 =

    f()

    1

    f()k

    .

    2 Conventions for Vector Functions

    Now let

    g() =

    g1()...

    gn()

    be a vector function depending on = (1,...,k)0, i.e. g() : Rk 7 Rn maps the k 1

    vector into a n 1 vector, where gi() = gi(1,...,k), i = 1, 2,...,n, is a real-valued

    function.

    Since g()is a column vector it is natural to consider the derivatives with respect to a

    row vector, 0, i.e.

    g()

    0 =

    g1()1

    g1()

    k...

    . . . ...

    gn()1

    gn()

    k

    , (2)

    where each row, i = 1, 2,...,n, contains the derivative of the scalar function gi() with

    respect to the elements in . The result is therefore a n k matrix of derivatives with

    typical element (i, j) given by gi()j. If the vector function is defined as a row vector, it

    is natural to take the derivative with respect to the column vector, .We can note that it holds in general that

    (g()0)

    =

    g()

    0

    0

    , (3)

    which in the case above is a k n matrix.

    Applying the conventions in (1) and (2) we can define the Hessian matrix of second

    derivatives of a scalar function f() as

    2f(

    )0 =

    2f(

    )0 =

    2f()11

    2f()

    1k

    ... . . . ...2f()

    k1

    2f()kk

    ,

    which is a k k matrix with typical elements (i, j)given by the second derivative 2f()

    ij.

    Note that it does not matter if we first take the derivative with respect to the column or

    the row.

    2

  • 8/12/2019 Matrix Diff

    3/5

    3 Some Special Functions

    First, let c be a k 1 vector and let be a k 1 vector of parameters. Next define the

    scalar function f() = c0, which maps the k parameters into a single number. It holds

    that (c0)

    =c. ()

    To see this, we can write the function as

    f() = c0= c11+c22+...+ckk.

    Taking the derivative with respect to yields

    f()

    =

    (c11+c22+...+ckk)1

    ...(c11+c22+...+ckk)

    k

    =

    c1...

    ck

    =c,

    which is a k 1vector as expected. Also note that since 0c= c0, it holds that

    0c

    =c. ()

    Now, let Abe a n k matrix and let be a k 1vector of parameters. Furthermore

    define the vector functiong() = A, which maps thek parameters inton function values.

    g() is an n 1vector and the derivative with respect to 0 is a n k matrix given by

    (A)0

    =A. ()

    To see this, write the function as

    g() =A=

    A111+A122+...+A1kk...

    An11+An22+...+Ankk

    ,

    and find the derivative

    g()

    0 =

    (A111+...+A1kk)

    1

    (A111+...+A1kk)

    k...

    . . . ...

    (An11+...+Ankk)1

    (An11+...+Ankk)

    k

    =

    A11 A1k... . . .

    ...

    An1 Ank

    =A.

    Similarly, if we consider the transposed function, g() = 0A0, which is a1 nrow vector,

    we can find the k n matrix of derivatives as

    0A0

    =A0. ()

    This is just an application of the result in (3).

    3

  • 8/12/2019 Matrix Diff

    4/5

    Now consider a quadratic functionf() = 0V for somek kmatrixV. This function

    maps the k parameters into a single number. Here we find the derivatives as the k 1

    column vector

    0V

    = (V +V 0), ()

    or the row variant

    0V

    0 =0(V +V 0). ()

    If V is symmetric this reduces to 2V and 20V, respectively. To see how this works,

    consider the simple case k= 3and write the function as

    0V =

    1 2 3

    V11 V12 V13V21 V22 V23V31 V32 V33

    1

    2

    3

    = V1121+V2222+V3323+ (V12+V21)12+ (V13+V31)13+ (V23+V32)23.

    Taking the derivative with respect to , we get

    0V

    =

    (0V )1

    (0V )2

    (0V )3

    =

    2V111+ (V12+V21)2+ (V13+V31)32V222+ (V12+V21)1+ (V23+V32)32V

    333

    + (V13

    +V31

    )1

    + (V23

    +V32

    )2

    =

    2V11 V12+V21 V13+V31

    V12+V21 2V22 V23+V32

    V13+V31 V23+V32 2V33

    1

    2

    3

    =

    V11 V12 V13

    V21 V22 V23

    V31 V32 V33

    +

    V11 V21 V31

    V12 V22 V32

    V13 V23 V33

    1

    2

    3

    = (V +V 0).

    4 The Linear Regression Model

    To illustrate the use of matrix differentiation consider the linear regression model in matrix

    notation,

    Y =X +,

    where Y is a T 1 vector of stacked left-hand-side variables, X is a T k matrix of

    explanatory variables, is a k 1vector of parameters to be estimated, and is a T 1

    vector of error terms. Here k is the number of explanatory variables and T is the number

    of observations.

    4

  • 8/12/2019 Matrix Diff

    5/5

    One way to motivate the ordinary least squares (OLS) principle is to choose the esti-

    mator,bOLS of, as the value that minimizes the sum of squared residuals, i.e.

    bOLS= arg min

    T

    Xt=1b2t = arg min

    b

    0

    b.

    Looking at the function to be minimized, we find that

    b0b = Y Xb 0 Y Xb=

    Y 0 b0X0Y Xb

    = Y 0Y Y 0Xbb0X0Y +b0X0Xb= Y 0Y 2Y 0Xb+b0X0Xb,

    where the last line uses the fact that Y 0Xbandb0

    X0Yare identical scalar variables.

    Note thatb0bis a scalar function and taking the first derivative with respect tobyieldsthek 1vector

    b0bb =

    Y 0Y 2Y 0Xb+b0X0Xbb = 2X0Y + 2X0Xb.

    Solving the k equations, (0)

    = 0, yields the OLS estimator

    bOLS=

    X0X

    1

    X0Y,

    provided thatX0Xis non-singular.

    To make sure thatbOLS is a minimum ofb0b and not a maximum, we should formallytake the second derivative and make sure that it is positive definite. The k k Hessian

    matrix of second derivatives is given by

    2b0b

    bb0 =2X0Y + 2X0Xb

    b0 = 2X0X,which is a positive definite matrix by construction.

    References

    [1] Verbeek, Marno (2004): A Guide to Modern Econometrics, Second edition, John

    Wiley and Sons.

    [2] Wooldridge, Jeffrey M.(2003): Introductory Econometrics: A Modern Approach,

    2nd edition, South Western College Publishing.

    5