the generalised method of moments

Download The Generalised Method of Moments

Post on 25-Feb-2016




2 download

Embed Size (px)


The Generalised Method of Moments. Ibrahim Stevens Joint HKIMR/CCBS Workshop Advanced Modelling for Monetary Policy in the Asia-Pacific Region May 2004. GMM. Why use GMM? Nonlinear estimation Structural estimation Robust estimation Models estimated using GMM Many. - PowerPoint PPT Presentation


  • The Generalised Method of MomentsIbrahim StevensJoint HKIMR/CCBS WorkshopAdvanced Modelling for Monetary Policy in the Asia-Pacific RegionMay 2004

  • GMMWhy use GMM?Nonlinear estimationStructural estimationRobust estimationModels estimated using GMMMany.Rational expectations models Euler EquationsNon-Gaussian distributed models

  • The Method of MomentsSimple moment conditions Population Sample

  • The Method of MomentsOLS as a MM estimator

    Moment conditions:

    MM estimator

  • Slightly more Generalised MMIV is a MM estimator

    Moment condition:

    MM estimator:

  • Slightly more Generalised MMIn the previous IV estimator we have considered the case where the number of instruments is equal to the number of coefficients we want to estimateSize of Z is the same as the size of IVWhat happens if the number of instruments is greater than the number of coefficients?Essentially, the number of equations is greater than the number of coefficients you want to estimate: model is over-identified

  • IV with more constraints than equationsMaintain the moment condition as beforeVariance of moment condition is:

    Minimise weighted distance:

  • IV with more constraints than equationsWhy do we do a minimisation exercise?Because we have more equations than unknows.How do we determine the true values of the coefficients?Solution is to minimise the previous expression so that the coefficients are able to approximate the moment condition, that is pick coefficients such that the orthogonality condition is satisfied

  • IV with more constraints than equationsFirst order conditions:

    MM estimator (looks like an IV estimator with more instruments than parameters to estimate):

  • Moment conditions in estimation Model may be nonlinearEuler equations often imply models in levels not logs (consumption, output, other first order conditions)Both ad hoc and structural models may be nonlinear in parameters of interest (systems)

    Models may have unknown disturbance structureRational expectationsMay not be interested in related parameters

  • A generalised problemLet any (nonlinear) moment condition be:

    Sample counterpart:


  • A generalised problemIf we have more instruments (n) than coefficients (p) we choose to minimise:

    What should the matrix W look like?

  • A generalised problemIt turns out that any symmetric positive definite matrix of W yields consistent estimates for the parameters

    However, it does not yield efficient ones

    Hansen (1982) derives the necessary (not sufficient) condition to obtain asymptotically efficient estimates for the coefficients

  • Choice of W (efficiency)Appropriate weight matrices (Hansen, 82):

    Intuition: W-1 denotes the inverse of the covariance matrix of the sample moments. This matrix is chosen because it means that less weight is placed on the more imprecise moments

  • ImplementationImplementation is generally undertaken in a two-step procedure:

    Any symmetric positive definite matrix yields consistent estimates of the parameters. Thus exploit this. Using any symmetric positive definite matrix, back up estimates for the parameters in the modelAn arbitrary matrix such the identity matrix is normally used to obtain the first consistent estimatorUsing these parameters construct the weighting matrix W and from that we can undertake the minimisation problemThis process can be iteratedSome computational cost

  • Instrument validity and W

    Estimation of the minimised criterion can be used to test the validity of the instrumentsEViews gives you the wrong Hansen J-statistic - test of overidentification

    Multiply by the number of observations to get correct JThis is a Chi squared with n-p degrees of freedomIf a sub-optimal weighting matrix is used, Hansens J-test does not apply. See Chochrane 1996We can also test as sub-set of othogonality conditions

  • Covariance estimatorsChoosing the right weighting matrix is important for GMM estimation

    There have been many econometric papers written on this subject

    Estimation results can be sensitive to the choice of weighting matrix

  • Covariance estimatorsSo far we have not considered the possibility that heteroskedasticity and autocorrelation be a part of your model

    How can we account for this? We need to modify the covariance matrix

  • Covariance estimatorsWrite our covariance matrix of empirical moments as:

    Where Mq is the qth row of the Txn matrix of sample moments

  • Covariance estimatorsDefine the autocovariances:

    Express W in terms of the above expressions:

  • Covariance estimatorsIf there is no serial correlation, the expressions for j0 are all equal to zero (since the autocovariances will be zero):

    Note that this looks like a White (1980) heteroskesdastic consistent estimator

  • Covariance estimatorsIf this looks like a White (1980) heteroskesdastic consistent estimator implementation should be straight-forward! Example (Remembering White): Take the standard heteroskedastic version of the linear model

  • Covariance estimators The appropriate problem and weighting matrix are

    The weighting matrix can be consistently estimated by using any consistent estimator of the models parameters and substituting the expected value of the squared residuals by the actual residual

    (NB. The only difference here is that we are generalising the problem by allowing instruments, ie Zs)

  • Covariance estimatorsThe problem is that with autocorrelation it is not possible to replace the expected values of the squared residuals by the actual values from the first estimationIt would lead to an inconsistent estimate of the autocovariance matrix of order jThe problem of this approach is that, asymptotically, the number of estimated autocovariances grows at the same rate as the sample sizeThus whilst unbiased W is not consistent in the mean squared error sense

  • Covariance estimatorsThus we require a class of estimators that circumvents these problemsA class of estimators that prevent the autocovariances from growing with the sample size are

    Parzen termed the ws the lag windowThese estimators correspond to a class of kernel (spectral density) estimators (evaluated at frequency zero)

  • Covariance estimatorsThe key is to choose the sequence of ws such that the sequence of weights approaches unity rapidly enough to obtain asymptotic unbiasedness but slowly enough to ensure that the variance converges to zeroThe type of weights you will find in EViews correspond to a particular class of lagged windows termed scale parameter windowsThe lag window is expressed as

  • Covariance estimatorsHAC matrix estimation:

    k(j/bT) is a kernel, bT is the bandwidthIntuition: bT streches or contracts the distribution; it acts as a scaling parameterk(z) is referred to as the lagged window generator

  • Covariance estimatorsHAC matrix estimation:

    When the value of the kernel is zero for z>1, bT is called a lag truncation parameter (autocovariances corresponding to lags greater than bT are given zero weight)The scalar bT is often referred to as the bandwidth parameter

  • Covariance estimators Eviews provides two kernels: Quadratic Barlett It provides 3 options for the bandwidth parameter bT

    (See manual for specific functional forms and good discussion!)

  • Covariance estimatorsFor instance Newey and West (1987) suggest using a Barlett:

    Guarantees positive definiteness (which is something that we desire since we would like a positive variance)

  • Alternative covariance estimatorsAndrews (1991)Quadratic spectral estimator:


  • Pre-whiteningAndrews and Monahan (1992)Fit an VAR to the moment residuals:

    where:This is known as a pre-whitened estimateCan be applied to any kernel

  • Linear modelsEstimate by IV (consistent but inefficient):

    Use estimates to construct estimate of W:

    Can iterate on estimates of W

  • Nonlinear modelsEstimate by nonlinear IVMay solve by standard iterative nonlinear IVEstimate covariance matrixMinimise J using non-linear optimisationIterate on covariance matrix (optional)Eviews uses Berndt-Hall-Hall-Hausman or Marquardt algorithms (see manual for pros and cons)

  • Useful factsCovariance matrix estimators must be positive-definite, asymptotically it has been shown that the quadratic spectral window is best

    But in small samples Newey and West (1994) show little difference between the Quadratic and their estimator (based on Barlett)

  • Useful factsChoice of bandwidth parameter more important than the choice of the kernelVariable Newey and West and Andrews is state of the artHAC estimators suffer from poor small sample performance, thus test statistics (eg t-test) may not be reliable t-stats appear to reject a true null far more often than their nominal sizeAdjustments to the matrix W may be made but these depend on whether there is autocorrelation and/or heteroskesdacity

  • Useful factsNumerical Optimisation common problem of not having a global maximum/minimumEg Problems of local maximum/minimum or flat functionsWithout a global mimimum, GMM estimation does not yield consistent and efficient estimatesConvexity of the criterion function is important it guarantees global minima

  • Useful factsFor non-convex problems you must use different methodsA multi-start algorithm popular: start at a local optimisation algorithm from initial values of the parameters to converge to a local minimum and the repeat the process a number of times with different sta


View more >