lecture notes on time series

Upload: tasneem-raihan

Post on 27-Feb-2018

227 views

Category:

Documents


2 download

TRANSCRIPT

  • 7/25/2019 Lecture Notes on Time Series

    1/165

    Lectures in Modern EconomicTime Series Analysis. 2 ed. c

    Bo SjLinkping, Swedenemail:[email protected]

    October 30, 2011

  • 7/25/2019 Lecture Notes on Time Series

    2/165

    2

  • 7/25/2019 Lecture Notes on Time Series

    3/165

    CONTENTS

    1 Introduction 7

    1.1 Outline of this Book/Text/Course/Workshop . . . . . . . . . . . . 81.2 Why Econometrics? . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3 Junk Science and Junk Econometrics . . . . . . . . . . . . . . . . . 9

    2 Introduction to Econometric Time Series 112.1 Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    2.2 Dierent types of time series . . . . . . . . . . . . . . . . . . . . . 13

    2.3 Repetition - Your First Courses in Statistics and Econometrics . . 15

    I Basic Statistics 19

    3 Time Series Modeling - An Overview 21

    3.1 Statistical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.3 Moments of random variables . . . . . . . . . . . . . . . . . . . . . 243.4 Popular Distributions in Econometrics . . . . . . . . . . . . . . . . 26

    3.5 Analysing the Distribution . . . . . . . . . . . . . . . . . . . . . . . 273.6 Multidimensional Random Variables . . . . . . . . . . . . . . . . . 293.7 Marginal and Conditional Densities . . . . . . . . . . . . . . . . . . 303.8 The Linear Regression Model A General Description . . . . . . 30

    4 The Method of Maximum Likelihood 354.1 MLE for a Univariate Process . . . . . . . . . . . . . . . . . . . . . 354.2 MLE for a Linear Combination of Variables . . . . . . . . . . . . . 38

    5 The Classical tests - Wald,LM and LR tests 41

    II Time Series Modeling 43

    6 Random Walks, White noise and All That 456.1 Dierent types processes . . . . . . . . . . . . . . . . . . . . . . . . 456.2 White Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    6.3 The Log Normal Distribution . . . . . . . . . . . . . . . . . . . . . 476.4 The ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.5 The Random Walk Model . . . . . . . . . . . . . . . . . . . . . . . 486.6 Martingale Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 506.7 Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.8 Brownian Motions . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.9 Brownian motions and the sum of white noise . . . . . . . . . . . . 55

    6.9.1 The geometric Brownian motion . . . . . . . . . . . . . . . 566.9.2 A more formal denition . . . . . . . . . . . . . . . . . . . . 57

    CONTENTS 3

  • 7/25/2019 Lecture Notes on Time Series

    4/165

    7 Introductioo to Time Series Modeling 597.1 Descriptive Tools for Time Series . . . . . . . . . . . . . . . . . . . 62

    7.1.1 Weak and Strong Stationarity . . . . . . . . . . . . . . . . . 647.1.2 Weak Stationarity, Covariance Stationary and Ergodic Processes 647.1.3 Strong Stationarity . . . . . . . . . . . . . . . . . . . . . . . 657.1.4 Finding the Optimal Lag Length and Information Criteria . 66

    7.1.5 The Lag Operator . . . . . . . . . . . . . . . . . . . . . . . 677.1.6 Generating Functions . . . . . . . . . . . . . . . . . . . . . 687.1.7 The Dierence Operator . . . . . . . . . . . . . . . . . . . . 697.1.8 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707.1.9 Dynamics and Stability . . . . . . . . . . . . . . . . . . . . 707.1.10 Fractional Integration . . . . . . . . . . . . . . . . . . . . . 717.1.11 Building an ARIMA Model. The Box-Jenkins Approach . 717.1.12 Is the ARMA model identied? . . . . . . . . . . . . . . . . 71

    7.2 Theoretical Properties of Time Series Models . . . . . . . . . . . . 727.2.1 The Principle of Duality . . . . . . . . . . . . . . . . . . . . 72

    7.2.2 Wolds decomposition theorem . . . . . . . . . . . . . . . . 737.3 Additional Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    7.3.1 Seasonality . . . . . . . . . . . . . . . . . . . . . . . . . . . 757.3.2 Non-stationarity . . . . . . . . . . . . . . . . . . . . . . . . 76

    7.4 Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767.5 Overview of Single Equation Dynamic Models . . . . . . . . . . . . 78

    8 Multipliers and Long-run Solutions of Dynamic Models. 83

    9 Vector Autoregressive Models 859.0.1 How estimate a VAR? . . . . . . . . . . . . . . . . . . . . . 909.0.2 Impulse responses in a VAR with non-stationary variables

    and cointegration. . . . . . . . . . . . . . . . . . . . . . . . 90

    9.1 BVAR, TVAR etc. . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

    III Granger Non-causality Tests 93

    10 Introduction to Exogeneity and Multicollinearity 97

    10.1 E xogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9710.1.1 Weak Exogeneity . . . . . . . . . . . . . . . . . . . . . . . . 9710.1.2 Strong Exogeneity . . . . . . . . . . . . . . . . . . . . . . . 9810.1.3 Super Exogeneity . . . . . . . . . . . . . . . . . . . . . . . . 99

    10.2 Multicollinearity and understanding of multiple regression. . . . . . 99

    11 Univariate Tests of The Order of Integration 101

    11.0.1 The DF-test: . . . . . . . . . . . . . . . . . . . . . . . . . . 10111.0.2 The ADF-test . . . . . . . . . . . . . . . . . . . . . . . . . . 10211.0.3 The Phillips-Perron test . . . . . . . . . . . . . . . . . . . . 10311.0.4 The LMSP-test . . . . . . . . . . . . . . . . . . . . . . . . . 10411.0.5 The KPSS-test . . . . . . . . . . . . . . . . . . . . . . . . . 10411.0.6 TheG(p; q) test. . . . . . . . . . . . . . . . . . . . . . . . . 104

    11.1 The Alternative Hypothesis in I(1) Tests . . . . . . . . . . . . . . . 10511.2 Fractional Integration . . . . . . . . . . . . . . . . . . . . . . . . . 106

    4 CONTENTS

  • 7/25/2019 Lecture Notes on Time Series

    5/165

    12 Non-Stationarity and Co-integration 10912.0.1 The Spurious Regression Problem . . . . . . . . . . . . . . 11012.0.2 Integrated Variables and Co-integration . . . . . . . . . . . 11112.0.3 Approaches to Testing for Co-integration . . . . . . . . . . 112

    13 Integrated Variables and Common Trends 117

    14 A Deeper Look at Johansens Test 121

    15 The Estimation of Dynamic Models 12515.1 Deterministic Explanatory Variables . . . . . . . . . . . . . . . . . 12515.2 The Deterministic Trend Model . . . . . . . . . . . . . . . . . . . . 12715.3 Stochastic Explanatory Variables . . . . . . . . . . . . . . . . . . . 12715.4 Lagged Dependent Variables . . . . . . . . . . . . . . . . . . . . . . 12915.5 Lagged Dependent Variables and Autocorrelation . . . . . . . . . . 13015.6 The Problems of Dependence and the Initial Observation . . . . . 13115.7 Estimation with Integrated Variables . . . . . . . . . . . . . . . . . 133

    16 Encompassing 137

    17 ARCH Models 13917.0.1 Practical Modelling Tips . . . . . . . . . . . . . . . . . . . . 141

    17.1 Some ARCH Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 141

    17.2 Some Dierent Types of ARCH and GARCH Models . . . . . . . . 143

    17.3 The Estimation of ARCH models . . . . . . . . . . . . . . . . . . . 146

    18 Econometrics and Rational Expectations 14718.0.1 Rational v.s. other Types of Expectations . . . . . . . . . . 14718.0.2 Typical Errors in the Modeling of Expectations . . . . . . . 14818.0.3 Modeling Rational Expectations . . . . . . . . . . . . . . . 150

    18.0.4 Testing Rational Expectations . . . . . . . . . . . . . . . . 150

    19 A Research Strategy 153

    20 References 15720.1 A PPENDIX 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15820.2 Appendix III Operators . . . . . . . . . . . . . . . . . . . . . . . . 160

    20.2.1 The Expectations Operator . . . . . . . . . . . . . . . . . . 16120.2.2 The Variance Operator . . . . . . . . . . . . . . . . . . . . 16220.2.3 The Covariance Operator . . . . . . . . . . . . . . . . . . . 162

    20.2.4 The Sum Operator . . . . . . . . . . . . . . . . . . . . . . . 16220.2.5 The Plim Operator . . . . . . . . . . . . . . . . . . . . . . . 163

    20.2.6 The Lag and the Dierence Operators . . . . . . . . . . . . 164

    Abstract

    CONTENTS 5

  • 7/25/2019 Lecture Notes on Time Series

    6/165

    6 CONTENTS

  • 7/25/2019 Lecture Notes on Time Series

    7/165

    1. INTRODUCTION

    He who controls the past controls the future. George Orwell in"1984".

    Please respect that this is work in progress. It has never been my intention towrite a commercial book, or a perfect textbook in time series econometrics. It issimply a collection of lectures in a popular form that can serve as a complementto ordinary textbooks and articles used in education. The parts dealing withtests for unit roots (order of integration) and cointegration are not well developed.These topics have a memo of their own "A Guide to testing for unit roots andcointegration".

    When I started to put these lecture notes together some years ago I decidedon title "Lectures in Modern Time Series Econometrics" because I thought thatthe contents where a bit "modern" compared to standard econometric textbook.During the fall of 2010 as I started to update the notes I thought that it wastime to remove the word "modern" from the title. A quick look in DamodarGujaratis textbook "Basic Econometrics" from 2009 convinced my to keep theword "modern" in te title. Gujaratis text on time series hasnt changed since the1970s even though time series econometrics has changed completely since the 70s.Thus, under these circumstances I see no reason to change the title, at least notyet.

    There are four ways in which one do time series econometrics. The rst is to usethe approach of the 1970s, view your time series model just like any linear regres-

    sion, and impose a number ofad hocrestrictions that will hide all problems yound. This is not a good approach. This approach is only found in old textbooksand never in todays research. You might only see it used in very low scientic

    journals. Second, you can use theory to derive a time series model, and interest-ing parameters, that you then estimate with appropriate estimators. Examplesof this ti derive utility functions, assume that agents have rational expectationsetc. This is a proper research strategy. However, it typically takes good data,and you need to be original in your approach, but you can get published in good

    journals. The third, approach is simply to do statistical description of the dataseries, in the form of a vector autoregressive system, or reduced form of the vectorerror correction model. This system can used for forecasting, analysing relation-ships among data series and investigated with respect to unforeseen shocks suchas drastic changes in energy prices, money supply etc. The fourth way is to go

    beyond the vector autoregressive system and try to estimate structural parametersin the form of elasticities and policy intervention parameters. If you forget aboutthe rst method, the choice depends on the problem at hand and you chose toformulate it. This book aims at telling you how to use methods three and four.The basic thinking is that your data is the real world, theories are abstractionsthat we use to understand the real world. In applied econometric time series youshould always strive to build well-dened statistical models, that is models thatare consistent with the data chosen. There is a complex statistical theory behindall this, that I will try to popularize in this book. I do not see this book as asubstitute for an ordinary textbook. It is simply a complement.

    INTRODUCTION 7

  • 7/25/2019 Lecture Notes on Time Series

    8/165

    1.1 Outline of this Book/Text/Course/Workshop

    This book is intended for people who has done a basic course in statistics andeconometrics, either at the undergraduate or at the graduate level. If you did an

    undergraduate course I assume that you did it well. Econometrics is a type ofcourse were every lecture, and every textbook chapter leads to the next level. Thebest way to learn econometrics is to be active, read several books, work on yourown with econometric software. No teacher can learn you how to run a software.That is something you have to learn on your own by practicing how to use thesoftware. There are some very good software out there, and some The outlinedierences between graduate and Ph.D. level mainly in the theoretical parts. Atthe Ph.D. level, there is more stress on theoretical backgrounds.

    1) I will begin by talking about why econometrics is dierent from statistics,and why econometric time series is dierent from the econometrics your meet inmany basic textbooks.

    2) I will repeat very briey basic statistics, and linear regression and stresswhat you should know in terms of testing and modeling dynamic models. For

    most students that will imply going back and do some quick repetition.3) Introduction into statistical theory including maximum likelihood, randomvariables, density functions and stochastic processes.

    4) Fourth, basic time series properties and processes.5) Using and understanding ARFIMA and VAR modelling techniques.6) Testing for non-stationary in the form of stochastic trends, i.e. test for unit

    roots.7) The spurious regression problem8) Testing and understanding cointegration.9) Testing for Granger non-causality10) The theory of reduction, exogeneity and building dynamic models and

    systems11) Modelling time varying variances, ARCH and GARCH models

    12) The implications and consequences of rational expectations on econometricmodelling

    13) Non-linearities14) Additional topicsFor most of these topics I have developed more or less self-instructing exercises.

    1.2 Why Econometrics?

    Why is there a subject called econometrics? Why study econometrics, instead

    of statistics? Why not let the statisticians teach statistics, and in particular timeseries techniques? These are common questions, raised during seminars and in pri-vate, by students, statisticians and economists. The answer is that each scienticarea tends to create its own special methodological problems often heavily inter-related with theoretical issues. These problems, and the ways of solving them, areimportant in a particular area of science but not necessarily in others. Economicsis a typical example, were the formulation of the economic and the statisticalproblem is deeply interrelated from the beginning.

    In everyday life we are forced to make decisions based on limited information.Most of our decisions deal with the an uncertain stochastic future. We all base our

    8 INTRODUCTION

  • 7/25/2019 Lecture Notes on Time Series

    9/165

    decisions on some view of the economy where we assume that certain events arelinked to each other in more or less complex ways. Economists call this a modelof the economy. We can describe the economy and the behavior of the individ-uals in terms of multivariate stochastic processes. Decisions based on stochasticsequences play a central role economics and in nance. Stochastic processes arethe basis for our understanding about the behavior of economic agents and of how

    their behavior determine the future path of the economy. Most econometric textbooks deal with stochastic time series as a special application of the linear regres-sion technique. Though this approach is acceptable for an introductory course ineconometrics, it is unsatisfactory for students with a deeper interest in economicsand nance. To understand the empirical and theoretical work in these areas, itis necessary to understand some of the basic philosophy behind stochastic timeseries.

    This work is a work in progress. It is based on my lectures on Modern Eco-nomic Time Series Analysis at the Department of Economics rst at Universityof Gothenburg and later at University of Skovde and Linkping University inSweden. The material is not ready for a widespread distribution. This work, mostlikely, contains lots of errors, some are known by the author, and some are notyet detected. The dierent sections do not necessarily follow in a logical order.

    Therefore, I invite anyone who has opinions about this work to share them me.The rst part of this work provides a repetition of some basic statistical con-

    cepts, which are necessary understanding modern economic time series analysis.The motive for repeating these concepts is that they play a larger role in econo-metrics than many contemporary textbooks in econometrics indicate. Economet-rics did not change much from the rst edition of Johnston in the 60s until therevised version of Kmenta in the mid 80s. However, as a consequence of the cri-tique against the use of econometrics delivered by Sims, Lucas, Leamer, Hendryand others, in combination with new insights into the behavior of non-stationarytime series and the rapid development of computer technology, have revolutionizedeconometric modeling, and resulted in an explosion of knowledge. The demand forwriting a decent thesis, or a scientic paper, based on econometric methods has

    risen far beyond what one can learn in an introductory course in econometrics.

    1.3 Junk Science and Junk Econometrics

    In media you often hear about this and that being proved by scientic research.In the late 1990s newspapers told that someone had proved that genetic modied(GM) food could be dangerous. The news were spread quickly, and according tothe story the original article had been stooped from being published by scientistswith suspicious motives. Various lobby groups immediately jumped up. GM food

    were dangerous, should be banned and more money should go into this line ofresearch. What had happened was the following. A researcher claimed to haveshown that GM food were bad for health. He claimed this results for a numberof media people, who distributed the results. (Remember the fuss about coldfusion). The result were presented in a paper sent to a scientic journal forpublication. The journal however, did not publish the article. It was dismissedbecause the results were not based on a sound scientic method. The researcherhad feed rats with potatoes. One group of rats got GM potatoes, the other groupof rats got normal non-GM potatoes. The rats that got GM potatoes seemedto develop cancer more often than the control group. The statistical dierence

    JUNK SCIENCE AND JUNK ECONOMETRICS 9

  • 7/25/2019 Lecture Notes on Time Series

    10/165

    between the groups were not big, but suciently big for those wanting to conrmtheir a priori beliefs that GM food is bad. A somewhat embarrassing detail, neverreported in the media, is that rats in general do not like potatoes. As a consequenceboth groups of rats in this study were suering from starvation, which severelyaected the test. It was not possible to determine if the dierence between the twogroups were caused by starvation, or by GM food. Once the researcher conditioned

    on the eects of starvation, the dierence became insignicant. This is an exampleof Junk science, bad science getting a lot of media exposure because the resultsts the interests of lobby groups, and can be used to scare people.

    The lesson for econometricians is obvious, if you come up with good resultsyou get rewarded, bad results on the other hand can quickly be forgotten. TheGM food example is extreme econometric work. Econometric research seldom getsuch media coverage, though there are examples such as Swedens economic growthis less than other similar countries, the assumed dynamic eects of a reductionof marginal taxes. There are signicant results that depend on one single outlier.Once the outlier is removed, the signicance is gone, and the whole story behindthis particular book is also gone.

    In these lectures we will argue that the only way to avoid junk econometricsis careful and systematic construction and testing of models. Basically, this is the

    moderneconometric time series approach. Why is this modern, and why stress theidea of testing? The answers are simply that careers have been build on running

    junk econometric equations, most people are unfamiliar with scientic methods ingeneral and the consequences of living in a world surrounded by random variablesin particular.

    10 INTRODUCTION

  • 7/25/2019 Lecture Notes on Time Series

    11/165

    2. INTRODUCTION TO ECONO-METRIC TIME SERIES

    "Time is a great teacher, but unfortunately it kills all its pupils" Louis HectorBerlioz

    A time series is simply data ordered by time. For an econometrician time seriesis usually data that is also generated over time in such a way that time can beseen as a driving factor behind the data. Time series analysis is simply approachesthat look for regularities in these data ordered by time.

    In comparison with other academic elds, the modeling of economic time seriesis characterized by the following problems, which partly motivates why economet-rics is a subject of its own:

    The empirical sample sizes in economics are generally small, especially com-pared with many applications in physics or biology. Typical sample sizesranges between 25 - 100 observations. In many areas anything below 500observations is considered a small sample.

    Economic time series are dependent in the sense that they are correlated withother economic time series. In the economic science, problems are almostnever concerned with univariate series. Consumption, as an example, is afunction of income, and at the same time, consumption also aects incomedirectly and through various other variables.

    Economic time series are often dependent over time. Many series displayhigh autocorrelation, as well as cross autocorrelation with other variablesover time.

    Economic time series are generally non-stationary. Their means and vari-ances change over time, implying that estimated parameters might follow un-known distributions instead of standard tabulated distributions like the nor-mal distribution. Non-stationarity arises from productivity growth and priceination. Non-stationary economic series appear to be integrated, driven bystochastic trends, perhaps as a result of stochastic changes in the total fac-tor productivity. Integrated variables, and in particular the need to modelthem, are not that common outside economics. In some situations, therefore,inference in econometrics become quite complicated, and requires the devel-opment of new statistical techniques for handling stochastic trends. Theconcepts of cointegration and common trends, and the recently developedasymptotic theory for integrated variables are examples of this.

    Economic time series cannot be assumed to be drawn from samples in theway assumed in classical statistics. The classical approach is to start froma population from which a sample is drawn. Since the sampling process canbe controlled the variables which make up the sample can be seen as ran-dom variables. Hypothesis are then formulated and tested conditionally onthe assumption that the random variables have a specic distribution. Eco-nomic time series are seldom random variables drawn from some underlyingpopulation in the classical statistical sense. Observations do not represent

    INTRODUCTION TO ECONOMETRIC TIME SERIES 11

  • 7/25/2019 Lecture Notes on Time Series

    12/165

    a random sample in the classical statistical sense, because the econometri-cian cannot control the sampling process of variables. Variables like, GDP,money, prices and dividends are given from history. To get a dierent sam-ple we would have to re-run history, which of course is impossible. The waystatistic theory deals with this situation is to reverse the approach taken inclassical statistic analysis, and build a model that describes the behavior of

    the observed data. A model which achieves this is called a well dened sta-tistical model, it can be understood as a parsimonious time invariant modelwith white noise residuals, that makes sense from economic theory.

    Finally, from the view of economics, the subject of statistics deals mainlywith the estimation and inference of covariances only. The econometrician,however, must also give estimated parameters an economic interpretation.This problem cannot always be solved ex post, after the a model has been es-timated. When it comes to time series, economic theory is an integrated partof the modeling process. Given a well dened statistical model, estimatedparameters should represent behavior of economic agents. Many econometricstudies fail because researchers assume that their estimates can be given aneconomic interpretation without considering the statistical properties of the

    model, or the simple fact there is in general not a one to one correspondencewith observed variables and the concepts dened in economic theory.1

    2.1 Programs

    Here is a list of statistical software that you should be familiar with, please goggle,(those recommended for time series are marked with *):

    *RATS and CATS in RATS, Regression Analysis of Time Series and Coin-tegrating Analysis of Time Series (www.estima.com)

    - *PcGive - Comes highly recommended. Included in Oxmetrics modules, seealso Timberlake consultants for more programs.

    - *Gretl (Free GNU license, very good for students in econometrics)- *JMulti (Free for multivariate time series analysis, updated? The discussion

    forum is quite dead, www.jmulti.com)

    - *EViews- Gauss (good for simulation)

    - STATA (used by the World Bank, good for microeconometrics, panel data,OK on time series)

    - LIMDEP (Mostly free with some editions of Greens Econometric text

    book?, you need to pay for duration models?)- SAS - Statistical Analysis System (good for big data sets, but not time series,

    mainly medicine, "the calculus program for decision makers")

    - Shazam

    And more, some are very special programs for this and that, ... but I dontnd them worth mentioning in this context.

    1 For a recent discussion about the controversies in econometrics see The Economic Journal1996.

    12 INTRODUCTION TO ECONOMETRIC TIME SERIES

  • 7/25/2019 Lecture Notes on Time Series

    13/165

    There is a bunch of software that allows you to program your own models oruse other peoples modules:

    - Matlab- R (Free, GNU license, connects with Gretl)- Ox

    You should also know about C, C++, and LaTeX to be a good econometrician.Please google.

    For Data Envelopment Analysis (DEA) I recommend Tom Coellis DEAP 2.1or Paul W. Wilsons FEAR.

    2.2 Dierent types of time series

    Given the general denition of time series above, there many types of time series.

    The focus in econometrics, macroeconomics and nance is in stochastic time seriestypically in the time domain, which are non-stationarity in levels but becomes whatis called covariance stationary after dierencing.

    In a broad perspective, time series analysis typically aims at making time seriesmore understandable by decomposing them into dierent parts. The aim of thisintroduction is to give a general overview of the subject. A time series is anysequence ordered by time. The sequence can be either deterministic or stochastic.The primary interest in economics is in stochastic time series, where the sequenceof observations is made up by the outcome of random variables. A sequence ofstochastic variables ordered by time is calleda stochastic time series process.

    The random variables that make up the process can either be discrete ran-dom variables, taking on a given set of integer numbers, or be continuousrandom variablestaking on any real number between

    1: While discrete ran-

    dom variables are possible they are not that common in economic time seriesresearch.

    Another dimension in modeling time series is to consider processes in discretetimeor incontinuous time. The principal dierence is that stochastic variablesin continuous time can take dierent values at any time. In a discrete time process,the variables are observed at xed intervals of time (t), and they do not changebetween these observation points. Discrete time variables are not common innance and economics. There are few, if any variables that remain xed betweentheir points of observations. The distinction between continuous time and discretetime is not matter of measurability alone. A common mistake is to be confused thefact that economic variables are measured at discrete time intervals. The moneystock is generally measured and recorded as an end-of-month value. The way of

    measuring the stock of money does not imply that it remains unchanged betweenthe observation interval, instead it changes whenever the money market is open.The same holds for variables like production and consumption. These activitiestake place 24 hours a day, during the whole year. The are measured as the owof income and consumption over a period, typically a quarter, representing theintegral sum of these activities.

    Usually, a discrete time variable is written with a time subscript (xt) whilecontinuous time variables written as x(t). The continuous time approach hasa number of benets, but the cost and quality of the empirical results seldommotivate the continuous time approach. It is better to use discrete time approaches

    DIFFERENT TYPES OF TIME SERIES 13

  • 7/25/2019 Lecture Notes on Time Series

    14/165

    as an approximation to the underlying continuous time system. The cost fordoing this simplication is small compared with the complexity of continuoustime analysis. This should not be understood as a rejection of all continuoustime approaches. Continuous time is good for analyzing a number of well denedproblems like aggregation over time and individuals. In the end it should lead toa better understanding of adjustment speeds, stability conditions and interactions

    among economic time series, see Sj (1990, 1995).2In addition, stochastic time series can be analysed in the time domain or

    in the frequency domain. In the time domain the data is analysed ordered ingiven time periods such as days, weeks, years etc. The frequency approach de-composes time series into frequencies by using trigonometric functions like sinuses,etc. Spectral analysis is an example of analysis that uses the frequency domain, toidentify regularities such as seasonal factors, trends, and systematic lags in adjust-ment etc. The main advantage with analysing time series in the frequency domainis that it is relatively easy to handle continuous time processes and observationsobserved as aggregations over time such as consumption.

    However, in economics and nance, where we are typically faced with givenobservations at given frequencies and we seek to study the behavior of agentsoperating in real time. Under these circumstances, the time domain is the most

    interesting road ahead because it has a direct intuitive appeal to both economistsand policy makers.

    A dimension in modeling time series is to consider processes in discrete timeor in continuous time. The principal dierence here is that the stochastic vari-ables in a continuous time process can take on dierent values at any time. Ina discrete time process, the variables are observed at xed intervals of time (t),and they are assumed not to change during the frequency interval. Discrete timevariables are not common in nance and economics. There are few, if any variablesthat remain xed between their points of observations. The distinction betweencontinuous time and discrete time is not matter of measurability alone. A com-mon mistake is to be confused the fact that economic variables are measured atdiscrete time intervals. The money stock is generally measured and recorded as

    an end-of-month value. The way of measuring the stock of money does not implythat it remains unchanged between the observation interval, instead it changeswhenever the money market is open. The same holds for variables like productionand consumption. These activities take place 24 hours a day, during the wholeyear. The are measured as the ow of income and consumption over a period,typically a quarter, representing the integral sum of these activities.

    Our interest is usually in analysing discrete time stochastic processes in thetime domain.

    A time series process is generally indicated with brackets, likefytg: In somesituations it is necessary to be more precise about the length of the process. Writ-ingfyg11 indicates that he process start at period one and continues innitely.The process consists of random variables because we can view each element infytg as a random variable. Let the process go from the integer values1 up to T :If necessary, to be exact, the rst variable in the process can be written as yt1thesecond variable yt2etc. up untilytT : The distribution function of the process canthen be written as F(yt1 ; yt2 ;:::;ytT):

    2 We can also mention the dierent types of series that are used; stocks, ows and pricevariables. Stocks are variables that can b e observed at a point in time like, the money stock,inventories. Flows are variables that can only be observed over some period, like consumption orGDP. In this context price variables include prices, interest rates and similar variables which canbe observed at a market at a given point in time. Combining these variables into multivariateprocess and constructing econometric models from observed variables in discrete time producesfurther problems, and in general they are quite dicult to solve without using continuous timemethods. Usually, careful discrete time models will reduce the problems to a large extent.

    14 INTRODUCTION TO ECONOMETRIC TIME SERIES

  • 7/25/2019 Lecture Notes on Time Series

    15/165

    In some situation it is necessary to start from the very beginning. A time seriesis data ordered by time. A stochastic time series is a set of random variablesordered by time. Let ~Yit represent the stochastic variable ~Yi given at time t.Observations on this random variable is often indicated as yit. In general termsa stochastic time series is a series of random variables ordered by time. A seriesstarting at time t = 1 and ending at time t = T, consisting ofTdierent random

    variables is written asn

    ~Y1;1;~Y2;2;::: ~YT;To

    . Of course, assuming that the series isbuilt up by individual random variables, with their own independent probabilitydistributions is a complex thought. But, nothing in our denition of stochastictime series rules out that the data is made up by completely dierent randomvariables. Sometimes, to understand and nd solutions to practical problems, itwill be necessary to go all the way back to the most basic assumptions.

    Suppose we are given a time series consisting of yearly observations of interestrates, f6:6; 7:5; 5:9; 5:4; 5:5; 4:5; 4:3; 4:8g, the rst question to ask is this a stochasticseries in the sense that these number were generated by one stochastic process orperhaps several dierent stochastic processes? Further questions would be to askif the process or processes are best represented as continuous or discrete, are theobservations independent or dependent? Quite often we will assume that the series

    are generated by the same identical stochastic process in discrete time. Based onthese assumptions the modelling process tries to nd systematic historical pattersand cross-correlations with other variables in the data.

    All time series methods aim at decomposing the series into separate parts insome way. The standard approach in time series analysis is to decompose as

    yt = Tt;d+ St;d+ Ct;d+ It;

    whereTdand Sdrepresents (deterministic) trend and seasonal components, Ct;disdeterministic cyclical components and Iis process representing irregular factors3 .For time series econometrics this denition is limited, since the econometricianis highly interested in the irregular component. As an alternative, letfytg be astochastic time series process, which is composed as,

    yt

    = systematic components+ unsystematic components

    = Td+ Ts+ Sd+ Ss+ fyt g + et, (2.1)where the systematic components include deterministic trends Td, stochastic trendTs;deterministic seasonals Sd stochastic seasonalsSs, a stationary process (or theshort-run dynamics)yt , and nally a white noise innovation termet:The modelingproblem can be described as the problem of identifying the systematic componentssuch that the residual becomes a white noise process. For all series,rememberthat any inference is potentially wrong, if not all components have been modeledcorrectly. This is so, regardless of whether we model a simple univariate serieswith time series techniques, a reduced system, a or a structural model. Inferenceis only valid for a correctly specied model.

    2.3 Repetition - Your First Courses in Statistics and

    Econometrics

    1. To be completed...

    3 For simplicity we assume a linear process. An alternative is to assume that the componentsare multiplicative,xt= Tt;d St;d Ct;d It:

    REPETITION - YOUR FIRST COURSES IN STATISTICS AND ECONOMETRICS 15

  • 7/25/2019 Lecture Notes on Time Series

    16/165

    In you rst course in statistics you learned how to use descriptive statistics;the mean and the variance. Next you learned to calculate the mean and variancesfrom a sample that represents the whole underlying population. For the mean andthe variance to work as a description of the underlying population it is necessaryto construct the sample in such a way that the dierence between the samplemean and the true population mean is non-systematic meaning that the dierence

    between the sample mean and the population is unpredictable. This man thatyour estimated sample mean is random variable with known characteristics.

    The most important thing is to construct a sampling mechanism so that themean calculated from the sample has the characteristics you want to have. Thatis the estimated mean should be unbiased, ecient and consistent. You learnabout random variables, probabilities, distributions functions and frequency dis-tributions.

    Your rst course in econometrics"A theory should be as simple as possible, but not simpler" Albert Einstein

    To be completed...Random variables, OLS, minimize the sum of squares, assumptions 1 - 5(6),

    understanding, multiple regression, multicollinearity, properties of OLS estimator

    Matrix algebraTests and solutions for heteroscedasticity (cross-section), and autocorrelation

    (time series).If you read a good course you should have learned the three golden rules: test

    test test, and learned about the probabilities of the OLS estimator.Generalized least squares GLSSystem estimation: demand and supply models.Further extensions:Panel data, Tobit, Heckit, discrete choice, probit/logit, durationTime series: distributed lag models, partial adjustment models, error correc-

    tion models, lag structure, stationarity vs. non-stationarity, co-integrationWhat need to know ...What you probably do not know but should know.

    OLSOrdinary least squares is a common estimation method. Suppose there are two

    seriesfyt; xtgyt = + xt+ "t

    Minimize the sum of Squares over the samplet = 1; :2:::T,S=

    PTt=1 "

    2t =

    PTt=1(yt xt)2

    Take the derivative ofSwith respect to and , set the expressions to zero,and solve forand:

    S =S =

    =s

    T SS= ESS+ RSS1 = ESSTSS+

    RSSTSS

    R2 = 1 RSSTSS = ESSTSSBasic assumptions1)E("t) = 0 for all t

    16 INTRODUCTION TO ECONOMETRIC TIME SERIES

  • 7/25/2019 Lecture Notes on Time Series

    17/165

    2)E("t)2 =2 for all t

    3)E("t"tk) = 0for all k 6=t4)E(Xt"t) = 05)E(X0X) 6= 0

    6)"t s N ID(0; 2)

    Discuss these propertiesPropertiesGauss-Markow BLUEDeviationsMisspecication, add extra variable, forget relevant variableMulticollinearityError in variables problemHomoscedasticity HeteroscedasticityAutocorrelation

    REPETITION - YOUR FIRST COURSES IN STATISTICS AND ECONOMETRICS 17

  • 7/25/2019 Lecture Notes on Time Series

    18/165

    18 INTRODUCTION TO ECONOMETRIC TIME SERIES

  • 7/25/2019 Lecture Notes on Time Series

    19/165

    Part I

    Basic Statistics

    19

  • 7/25/2019 Lecture Notes on Time Series

    20/165

  • 7/25/2019 Lecture Notes on Time Series

    21/165

    3. TIME SERIES MODELING - AN

    OVERVIEW

    Economists are generally interested in a small part of what is normally included inthe subject Time Series Analysis. Various techniques such as ltering, smooth-ing and interpolation developed for deterministic time series are of relative minorinterest for economists. Time series econometrics is more focused on the stochasticpart of time series. The following is an brief overview of time series modeling, froman econometric perspective. It is not text book in mathematical statistics, nor isthe ambition to be extremely rigorous in the presentation of statistical concepts.The aim more to be a guide for the yet not so informed economist who wants toknow more about the statistical concepts behind time series econometrics.

    When approaching time series econometrics the statistical vocabulary quicklyincreases and can become overwhelming. These rst two chapters seek to make itpossible for people without deeper knowledge in mathematical statistics to readand follow the econometric and nancial time series literature.

    A time series is simply a set of observations ordered by time. Time seriestechniques seeks to decompose this ordered series into dierent components, whichin turn can be used to generate forecasts, learn about the dynamics of the series,and how it relates to other series. There is a number of dimensions and decisionto keep account of when approaching this subject.

    First, the series, or the process, can be univariate or multivariate, depending onthe problem at hand. Second, the series can be stochastic or purely deterministic.In the former case a stochastic random process is generating the observations.Third, given that the series is stochastic, with perhaps deterministic components,

    it can be modeled in the time domain or in the frequency domain. Modeling inthe frequency domain implies describing the series in terms cosines functions ofdierent wave lengths. This is a useful approach for solving some problems, but nota general approach for economic time series modeling. Fourth, the data generatingprocess and the statistical model can constructed in continuous or discrete time.Continuous time econometrics is good for some problems but not all. In general itleads to more complex models. A discrete time approach builds on the assumptionthat the observed data is unchanged between the intervals of observation. This isa convenient approximation, that makes modeling easier, but comes at a cost inthe form of aggregation biases. However, in the general case, this is a low cost,compared with the costs of general misspecication. A special chapter deals withthe discussion of discrete versus continuous time modeling.

    The typical economic time series is a discrete stochastic process modeled in

    the time domain. Time series can be modelled by smoothing and lter techniques.For economists these techniques are generally uninteresting, though we will brieycome back to the concept of lters.

    The simplest way to model an economic time series is to use autoregressivetechniques, or ARIMA techniques in the general case. Most economic time series,however, are better modeled as a part of a multivariate stochastic process. Eco-nomic theory systems of economic variables, leading to single equation transferfunctions and systems of equations in a VAR model.

    These techniques are descriptive, they do not identify structural, or deep para-meters like elasticities, marginal propensities to consume etc. The estimate more

    TIME SERIES MODELING - AN OVERVIEW 21

  • 7/25/2019 Lecture Notes on Time Series

    22/165

    specic economic models, we turn to techniques as VECM, SVAR, and structuralVECM.

    What is outlined above is quite dierent from the typical basic econometrictextbook approach, which starts with OLS and ends in practice with GLS as thesolution to all problems. Here we will develop methods, which rst describes thestatistical properties of the (joint) series at hand, and then allows the researcherto answer economic questions in such a way that the conclusions are statisticallyand economically valid. To get there we have to start with some basic statistics.

    3.1 Statistical Models

    A general denition of statistical time series analysis is that it nds a mathematicalmodel that links observed variables with the stochastic mechanism that generatedthe data. This sounds abstract, but the purpose of this abstraction is understandthe analytical tools of time series statistics. The practical problem is the following;we have some stochastic observations over time. We know that these observationshave been generated by a process, but we do not know what this process lookslike. Statistical time series analysis is about developing the tools needed to mimicthe unknown data generating function (DGP).

    We can formulate some general features of the model. First, it should be awell-dened statistical model in the sense that the assumptions behind the modelshould be valid for the data chosen. Later we will dene more exactly what thisimplies for an econometric model. For the time being, we can say that single

    most important criteria of models is that the residuals should be a white noiseprocess. Second, the parameters of the model should be stable over time. Third,the model should be simple, or parsimonious, meaning that its functional formshould be simple. Fourth, the model should be parameterized in such a way thatit is possible to give the parameters a clear interpretation and identify them withevents in the real world. Finally, the model should be able to explain other rivalmodels describing the dependent variable(s).

    The way to build a well-dened-statistical-model is to investigate the under-lying assumptions of the model in a systematic way. It can easily be shown thatt-values, R2, and Durbin-Watson values are not sucient for determining the tof a model. In later chapters we will introduce a systematic test procedure.

    The nal aim of econometric modelling is to learn about economic behavior. To

    some extent this always implies using some a prioriknowledge about in the formof theoretical relationships. Economists, in general, have extremely strong a prioribelief about the size and sign of certain parameters. This way of thinking has leadto much confusion, becausea prioribelieves can be driven too far. Econometrics isbasically about measuring correlations. It is a common misunderstanding amongnon-econometricians that correlations can be too high or too low, or be deemedright or wrong. Measured correlations are the outcome of the data used, only.Anyone who thinks of an estimated correlation as wrong, must also explain whatwent wrong in the estimation process, which requires knowledge of econometricsand the real world.

    22 TIME SERIES MODELING - AN OVERVIEW

  • 7/25/2019 Lecture Notes on Time Series

    23/165

    3.2 Random Variables

    The basic reason for dealing with stochastic models rather than deterministicmodels is that we are faced with random variables. A popular denition ofrandom variables goes like this: a random variable is a variable that can take onmore than one value. 1 For every possible value that a random variable can takeon there is a number between zero and one that describes the probability thatthe random variable will take on this value. In the following a random variable isindicated with.

    In statistical terms, a random variable is associated with the outcome of astatistical experiment. All possible outcomes of such an experiment can be calledthe sample space. IfSis a sample space with a probability measure and if ~Xis real valued function dened over Sthen ~X is called a random variable.

    There are two types of random variables; discrete random variables, whichonly take on a specic number of real values, and (absolute) continuous randomvariables, which can take on any value between 1. It is also possible to examinediscontinuous random variables, but we will limit ourselves to the rst two types.

    If the discrete random variable ~Xcan take k numbers of values (x1, ..., xk),the probability of observing a value xj can be stated as,

    P(xj) = pj : (3.1)

    Since probabilities of discrete random variables are additive, the probability ofobserving one of thek possible outcomes is equal to 1.0, or using the notation justintroduced,

    P(x1; x2; :::; or xk) = p1+p2+ ::: +pk = 1: (3.2)

    A discrete random variable is described by its probability function, F(xi),which species the probability with which ~Xtakes on a certain value. (The termcumulative distribution is used synonymous with probability function).

    In time series econometrics we are in most applications dealing with continuousrandom variables. Unlike discrete variables, it is not possible to associate a specicobservation with a certain probability, since these variables can take on an inniterange of numbers. The probability that a continuous random variable will takeon a certain value is always zero. Because it is continuous we cannot make adierence between 1.01 and 1.0101 etc. This does not mean that the variables donot take on specic values. The outcome of the experiment, or the observation, isof course always a given number.

    Thus, for a continuous random variable, statements of the probability of anobservation must be made in terms of the probability that the random variable ~Xis less than or equal to some specic value. We express this with the distributionfunctionF(x)of the random variable ~Xas follows,

    F(x) = P(~X x) f or 1 < x < 1; (3.3)

    which states the probability of ~Xtaking a value less than or equal to x.The continuous analogue of the probability function is called the density

    functionf(x), which we get by derivation of the distribution function, w:r:ttheobservations (x),

    dF(x)

    dx =f(x): (3.4)

    1 Random variables (RV:s) are also called stochastic variables, chance variables, or variates.

    RANDOM VARIABLES 23

  • 7/25/2019 Lecture Notes on Time Series

    24/165

    The fundamental theorem of integral calculus gives us the following expressionfor the probability that ~Xtakes on a value less that or equal to x,

    F(x) =

    Z x1

    f(u)du: (3.5)

    It follows that for any two constants (a) and (b), with a < b, the probabilitythat ~Xtakes on a value on the interval from (a) to (b)is given by

    F(b) F(a) =Z b1

    f(u)du Z a1

    f(u)du (3.6)

    =

    Z ba

    f(u)du (3.7)

    The term density function is used in a way that is analogous to density inphysics. Think of a rod of variable density, measured by the functionf(x). Toobtain the weight of some given length of this rod, we would have to integrate itsdensity function over that particular part in which we are interested.

    Random variables care described by their density function and/or by their

    moments; the mean, the variance etc. Given the density function, the momentscan be determined exactly. In statistical work, we must rst estimate the moments,from the moments we can learn about density function. For, instance we can test,if the assumption of an underlying normal density function is consistent with theobserved data.

    A random variable can be predicted, in other words it is possible to form anexpectation of its outcome based on its density function. Appendix III deals withthe expectations operator and other operators related to random variables.

    3.3 Moments of random variables

    Random variables are characterized by their probability density functions pdf :s)or their moments. In the previous section we introducedpdf :s: Moments refers tomeasurements such as the mean, the variance, skewness, etc. If we know the exactdensity function of a random variable then we would also know the moments. Inapplied work, we will typically rst calculate the moments from a sample, andfrom the moments gure out the density function of variables. The term momentoriginates from physics and the moment of a pendulum. For our purposes it can bethough of as a general term which includes the denition of concepts like the meanand the variance, without referring to any specic distribution. Starting with therst moment, the mathematical expectation of a discrete random variable is givenby,

    E(~X) =X

    xf(x) (3.8)

    where E is the expectation operator and f(x) is the value of its probabilityfunction at ~X. Thus,E(~X) represents the mean of the discrete random variable~X: Or, in other words, the rst moment of the random variable. For a continuous

    random variable( ~X), the mathematical expectation is

    E(~X) =

    Z 11

    x f(x)dx (3.9)

    24 TIME SERIES MODELING - AN OVERVIEW

  • 7/25/2019 Lecture Notes on Time Series

    25/165

    where f(x) is the value of its probability density at x. The rst moment canalso be referred to as the location of the random variable. Location is a moregeneric concept than the rst moment or the mean.

    The term moments are used in situations where we are interested in the ex-pected value of a function of a random variable, rather than the expectation of thespecic variable itself. Say that we are interested in ~Y, whose values are related

    to ~Xby the equation y = g(x). The expectation of ~Yis equal to the expectationofg(x), sinceE( ~Y) = E[g(x)]. In the continuous case this leads to,

    E( ~Y) = E[g(~X)] =

    Z 11

    g(x)f(x)dx: (3.10)

    Like density, the term moment, or moment about the origin, has its explanationin physics. (In physics the length of a lever arm is measured as the distance fromthe origin. Or if we refer to the example with the rod above, the rst momentaround the mean would correspond to horizontal center of gravity of the rod.)Reasoning from intuition, the mean can be seen as the midpoint of the limits ofthe density. The midpoint can be scaled in such a way that its becomes the originof thex- axis.

    The term moments of a random variable is a more general way of talkingabout the mean and variance of a variable. Settingg(x) equal to x, we get ther:th moment around the origin,

    0r =E(~Xr) =X

    xr f(x) (3.11)

    when ~Xis a discrete variable. In the continuous case we get,

    0r =E(~Xr) =

    Z 11

    xr f(x)dx: (3.12)

    The rst moment is nothing else than the mean, or the expected value of ~X.The second moment is the variance. Higher moments give additional information

    about the distribution and density functions of random variables.Now, dening g (~X) = (~X 0r) we get what is called the r:th moment about

    the mean of the distribution of the random variable ~X. Forr = 0, 1, 2, 3 ... weget for a discrete variable,

    r = E[(~X 0r)r] =X

    (~X 0r)r f(x) (3.13)

    and when ~Xis continuous

    r = E[(~X 0r)r] =Z 11

    (~X 0)r f(x)dx: (3.14)

    The second moment about the mean, also called the second central moment,

    is nothing else than the variance ofg(x) = x;

    var(~X) =

    Z 11

    [~X E(~X)]2 f(x)dx (3.15)

    =

    Z 11

    ~X2 f(x)dx [E(~X)]2 (3.16)

    = E(~X2) [E(~X)]2; (3.17)

    wheref(x) is the value of probability density function of the random variable~X atx:A more generic expression for the variance is dispersion. We can say that

    MOMENTS OF RANDOM VARIABLES 25

  • 7/25/2019 Lecture Notes on Time Series

    26/165

    the second moment, or the variance, is a measure of dispersion, in the same wayas the mean is a measure of location.

    The third moment, r = 3, measures asymmetry around the mean, referredto as skewness. The normal distribution is asymmetric around the mean. Thelikelihood of observing a value above or below the mean is the same for a normaldistribution. For a right skewed distribution, the likelihood of observing a value

    higher than the mean is higher than observing a lower value. For a left skeweddistribution, the likelihood of observing a value below the mean is higher thanobserving a value above the mean.

    The fourth moment, referred to as kurtosis, measures the thickness of thetails of the distribution. A distribution with thicker tails than the normal, ischaracterized by a higher likelihood of extreme events compared with the nor-mal distribution. Higher moments give further information about the skewness,tails and the peak of the distribution. The fth, the seventh moments etc. givemore information about the skewness. Even moments, above four, give furtherinformation the thickness of the tails and the peak.

    3.4 Popular Distributions in Econometrics

    In time series econometrics, and nancial economics, there is a small set of distri-butions that one has to know. The following is a list of common distributions:

    DistributionNormal distribution N

    ; 2

    Log Normal distribution LogN

    ; 2

    Student t distribution St

    ;;2

    Cauchy distribution Ca

    ; 2

    Gamma distribution Ga ;;2Chi-square distribution ()F distribution F(d1; d2)Poisson distribution Pois ()Uniform distribution U(ja; bj)

    Thepdfof a normal distribution is written as

    f(x) 1p22

    e(x)2

    22 :

    The normal distribution characterized by the following: the distribution issymmetric around its mean, and it has only two moments, the mean and thevariance,N(; 2). The normal distribution can be standardised to have a mean ofzero and variance of unity(say ( x

    E(x))and is consequently called a standardised

    normal distribution, N(0; 1).In addition, it follows that the rst four moments, the mean, the variance, the

    skewness and kurtosis, are E(~X) = , V ar(~X) = 2; Sk(~X) = 0;and Ku(~X) =3:There are random variables that are not normal by themselves but becomesnormal if they are logged. The typical examples are stock prices and variousmacroeconomic variables. LetSt be a stock price. The dollar return over a giveninterval, Rt = St St1 is not likely to be normally distributed due to simplefact that the stock price is raising over time, partly due to the fact that investorsdemand a return on their investment but mostly due to ination. However, if youtake the log of the stock price and calculate the per cent return (approximately),

    26 TIME SERIES MODELING - AN OVERVIEW

  • 7/25/2019 Lecture Notes on Time Series

    27/165

    rt = ln Stln St1, this variable are much more likely to have a normal distribution(or a distribution that can be approximated with a normal distribution). Thus,since you have taken logs of variables in your econometric models, you have alreadyworked with log normal variables. Knowledge about log normal distributions isnecessary if you want to model, or better understand, the movements of actualstock prices and dollar returns.

    The Studenttdistribution is similar to the normal distribution, it is symmetricaround the mean, it has a variance but has thicker tail than the normal distri-bution. The Student t distribution is described by

    ;;2

    where refers to

    the mean and 2 refers to the variance. The parameter is called the degreesof freedom of the Student t distribution and refers to the thickness of tails. Arandom variable that follows a Student t distribution will converge to a normalrandom variable as the number of observations goes to innity.

    The Cauchy distribution is related to the normal distribution and the Studenttdistribution. Compared with the normal it is symmetric and has two moments,but it has fatter tails and is therefore better suited for modelling random variableswhich takes on relatively more extreme events than the normal. The set back forempirical work is that higher moment are not dened meaning that it is dicultto use empirical moments to test for Cauchy distribution against say the normal

    or the Student t distribution.The gamma and the chi-square distributions are related to variances of normal

    random variables. If we have a set of normal random variablesn

    ~Y1;~Y2:::; ~Yv

    oand for a new variable as ~X ~Y21 + ~Y22 +::: + ~Y2v , then this new variable willhave a gamma distribution as ~X Ga( ; ; 2):A special case of the gammadistribution is when we have = 0 and 2 = 1, the distribution is then calleda chi-square distribution 2() with degrees of freedom. Thus, take the squareof an estimated regression parameter and divide it with it variance and you get achi-square distributed test for signicance of the estimated , (=) 2():

    The F distribution comes about when you compare the ration (or log dierence)of two squared normal random variables. The Poisson distribution is used to model

    jumps in the data, usually in combination with a geometric Brownian motions,

    (jump diusion models). The typical example is stock prices that might move upor down drastically. The parameter measures the probability of jump in thedata.

    3.5 Analysing the Distribution

    In practical work we need to know the empirical distribution of the variables weare working with, in order to make any inference. All empirical distributions cananalysed with the help of their rst four moments. Through the rst four moments

    we get information rst about the mean and the variance and second about theskewness and kurtosis. The latter moments are often critical when we decide if acertain empirical distribution should be seen as normal or at least approximatelynormal.

    It is, of course, extremely convenient to work with the assumption of a normaldistribution, since a normal distribution is described by its rst two momentsonly. In nance, the expected return is given be the mean, and the risk of theasset is given by its variance. An approximation to the holding period return ofan asset is the log dierence of its price. In the case of a normal distribution,there is no need to consider higher moments. Furthermore, linear combinations of

    ANALYSING THE DISTRIBUTION 27

  • 7/25/2019 Lecture Notes on Time Series

    28/165

    normal variates result in new normally distributed variables. In econometric work,building regression equations, the residual process is assumed to be a normallyindependent white noise process, in order to allow for inference and testing.

    It is by calculating the sample moments we learn about the distribution of theseries at hand. The most typical problem in empirical work is to investigate howwell the distribution a variable can be approximated with a normal distribution.

    If the normal distribution is rejected for the residuals in a regression, the typicalconclusion is that there something important missing in the regression equation.The missing part is either an important explanatory variable, or the direct causeof an outlier.

    To investigate the empirical distribution we need to calculate the sample mo-ments of the variable. The sample mean, offxtg =fx1; x2;:::xTg; can be esti-mated as x = x = (1=T)

    PTt=1 xt. Higher moments can be estimated with the

    formula mr = (1=T)PT

    t=1(xt x)r:2A series is normally distributed, ~Xt N(x; 2x);subtracting the mean and di-

    viding with the standard error lead to a standardised normal variable, distributedas ~X N(0; 1):For a standardised normal variable the third and fourth moments

    equal 0 and 3, respectively. The standardised third moment is now as Skewness,given as b1 = m23=m

    32. A skewness with a negative value indicates a left skew

    distribution, compared with the normal. If the series is the return on an asset itmeans that bad or negative surprises dominates over good positive surprises. Apositive value of skewness implies a right skewed distribution. In terms of assetreturns, good or positive surprises are more likely than bad negative surprises.

    The fourth moment, kurtosis is calculated as b2 = m4=m22: A value above3, implies that the distribution generates more extreme values than the normaldistribution. The distribution has fatter tails than the normal. Referring to assetreturns, approximating the distribution with the normal, would underestimate therisk associated with the asset.

    An asymptotic test, with a null of a normal distribution is given by 3 ,

    JB= T

    m23=m

    32

    6 [(m4=m

    22) 3]2

    24

    + T

    3m212m2

    +m1m3

    m22

    2(2):

    This test is known as the Jarque-Bera (JB) test and is the most commontest for normality in regression analysis. The null hypothesis is that the series isnormally distributed. Let1; 2, 3 and 4 represent the mean, the variance, theskewness and the kurtosis. The null of a normal distribution is rejected if the teststatistics is signicant. The fact that the test is only valid asymptotically, meansthat we do not know the reason for a rejection in a limited sample. In a less thanasymptotic sample rejection of normality is often caused by outliers. If we think

    the most extreme value(s) in the sample are non-typical outliers, removing themfrom the calculation the sample moments usually results in a non-signicant J Btest. Removing outliers is add hoc. It could be that these outliers are typicalvalues of the true underlying distribution.

    2 For these moments to be meaningful, the series must be stationary. Also, we would likefxtg to an independent process. Finally, notice that the here suggested estimators of the highermoments are not necessarily ecient estimators.

    3 This test statistics is for a variable with a non-zero mean. If the variable is adjusted for itsmean (say an estimated residual), the second should be removed from the expression.

    28 TIME SERIES MODELING - AN OVERVIEW

  • 7/25/2019 Lecture Notes on Time Series

    29/165

    3.6 Multidimensional Random Variables

    We will now generalize the work of the previous sections by considering a vectorofn random variables,

    ~X= (

    ~X1;

    ~X2;:::;

    ~Xn) (3.18)

    whose elements are continuous random variables with density functions f(x1)..., f(xn), and distribution functions F(x1) ..., F(xn). The joint distribution willlook like,

    F(x1; x2;:::;xn) =

    Z xn1

    Z x11

    f(x1; x2;:::;xn)dx1 dxp; (3.19)

    wheref(x1, x2, ..., xn) is the joint density function.

    If these random variables are independent, it will be possible to write theirjoint density as the product of their univariate densities,

    f(x1; x2;:::;xn) = f(x1)f(x2) f(xn): (3.20)For independent random variables we can dene the r:th product moment as,

    E(~X1r1; ~X2

    r2;:::; ~Xnrn) (3.21)

    =

    Z 11

    Z 11

    x1r1 x2

    r2 xnrnf(x1; x2;:::;xn)dx1dx2 dxn;(3.22)

    which, if the variables are independent, factorizes into the product

    E(~X1r1)E(~X2

    r2) E(~Xnrn): (3.23)

    It follows from this result that the variance of a sum of independent random

    variables is merely the sum of these individual variances,

    var(~X1+ ~X2+ ::: + ~Xn) = var(~X1) + var(~X2) + ::: + var(~Xn): (3.24)

    We can extend the discussion of covariance to linear combinations of randomvariables, say

    a0 ~X= a1 ~X1+ a2 ~X2+ ::: + ap~Xp; (3.25)

    which leads to,

    cov(a0 ~X) =

    pXi=1

    pXj=1

    ai aj ij : (3.26)

    These results hold for matrices as well. If we have ~Y =A ~X, Z= B ~X, and thecovariance matrix between ~Xand ~Y(

    P), we have also that,

    cov( Y ;Y) = AX

    A0; (3.27)

    cov(Z; Z) = BX

    B0; (3.28)

    and

    cov( Y ; Z) = AX

    B0: (3.29)

    MULTIDIMENSIONAL RANDOM VARIABLES 29

  • 7/25/2019 Lecture Notes on Time Series

    30/165

    3.7 Marginal and Conditional Densities

    Given a joint density function of n random variables, the joint probability of asubsample of them is called the joint marginal density. We can also talk about

    joint marginal distribution functions. If we set n = 3 we get the joint densityfunctionf(x1, x2, x3). Given the marginal distribution g(x2 x3), the conditionalprobability density function of the random variable ~X1, given that the randomvariables ~X2 and ~X3 takes on the values x2 andx3 is dened as,

    '(x1j x2; x3) = f(x1 ; x2; x2)g(x2; x3)

    ; (3.30)

    orf(x1; x2; x3) = '(x1j x2; x3)g(x2 x3): (3.31)

    Of course we can dene a conditional density for various combinations of ~X1,~X2 and ~X3, like, p(x1, x3; j x2) or g(x3j x1, x2). And, instead of three dierent

    variables we can talk about the density function for one random variable, say ~Yt,for which we have a sample ofTobservations. If all observations are independent

    we get,f(y1; y2;:::;yt) = f(y1)f(y2):::f(yt): (3.32)

    Like before we can also look at conditional densities, like

    f(ytj y1; y2;:::;yt1); (3.33)which in this case would mean that (yt)the observation at timet is dependent

    on all earlier observations on ~Yt.It is seldom that we deal with independent variables when modeling economic

    time series. For example, a simple rst order autoregressive model like yt =yt1+t, implies dependence between the observations. The same holds for alltime series models. Despite this shortcoming, density functions with independentrandom variables, are still good tools for describing time series modelling, becausethe results based on independent variables carries over to dependent variables inalmost every case.

    3.8 The Linear Regression Model A General De-

    scription

    In this section we look at the linear regression model starting from two randomvariables ~Y and ~X. Two regressions can be formulated,

    y= + x + ; (3.34)

    andx= + y + : (3.35)

    Whether one chooses to condition y on x, orx on y depends on the parameterof interest. In the following it is shown how these regression expression are con-structed from the correlation between x and y, and their rst moments by makinguse of the (bivariate) joint density function ofx and y. (One can view this sectionas an exercise in using density functions).

    30 TIME SERIES MODELING - AN OVERVIEW

  • 7/25/2019 Lecture Notes on Time Series

    31/165

    Without explicitly stating what the density function looks like, we will assumethat we know the joint density function for the two random variables ~Y and ~X,and want to estimate a set of parameters, and. Hence we got, the joint density,

    D(y; x; ); (3.36)

    where is a vector of parameters which describes the relation between ~Y and~X. To get the linear regression model above we have condition on the outcome of

    ~X;D(y; x; ) =D(yj x; ); (3.37)

    where represents the vector of parameters of interest = [, ]. This oper-ation requires, that the parameters of interest can be written as a function of theparameters in the joint distribution function, = f().

    The expected mean of ~Y for given ~X is, equation 1

    E( ~Yj x; ) =Z

    y D(yj x; )dy= + x; (3.38)

    or if we choose to condition on ~Y instead,

    E(~Xj y; ) =Z

    x D(xj y; )dx= + x:

    The parameters in 3.38 can be estimated by using means, variances and co-variances of the variables. Or in other terms, by using some of the lower momentsof the joint distribution of ~X and ~Y. Hence, the rst step rewrite 3.38 in such away that we can write and in terms of the means of ~X and ~Y.

    Looking at the LHS of 3.38 it can be seen that a multiplication of the condi-tional density with the marginal density for ~X, g(x), leads to the joint density.Given the joint density we can choose to integrate out either x or y . In this casewe chose to integrate over x. Thus we have after multiplication,

    Z y D(yj x; )dyg(x)=

    g(

    x) +

    x g(

    x): (3.39)

    Integrating overx leads to, at the LHS,Z Z yD(yjx; )dydg(x)

    =

    Z Z yD(y;xj)dydxg(x)

    =

    Z yD(yj) = E(yj) = y: (3.40)

    Performing the same operations on the RHS leads to,

    Z g(x)dx +

    Z x g(x)dx

    = + E(~X) = + x: (3.41)

    If we put the two sides together we get,

    E( ~Yjx; ) = + E(~X) = y = + x: (3.42)

    We now have one equation two solve for the two unknowns. Since we haveused up the means let us turn to the variances by multiplying both sides of 3.38withx and perform the same operations again.

    THE LINEAR REGRESSION MODEL A GENERAL DESCRIPTION 31

  • 7/25/2019 Lecture Notes on Time Series

    32/165

    Multiplication with x and g(x) leads to,Z xyD(yj x; )dyg(x) = x g(x) + x2 g(x); (3.43)

    Integrate over x,

    Z Z xyD(yj x;)dydxg(x)

    =

    Z x g(x)dx+

    Z x2 g(x)dx: (3.44)

    The LHS leads to, Z Z xyD(y; xj)dydx=E( ~X~Y); (3.45)

    and the RHS,

    Z x g(x)dx + Z x2 g(x)dx= E(~X) + E(~X2): (3.46)

    Hence our second equation is,

    E(~X~Y) = E(~X) + E(~X2): (3.47)

    Remembering the rules for the expectations operator, E(~X~Y) = x y+ xy,

    and E(~X2) = 2x+ 2x makes it possible to solve for and in terms of means

    and variances. From the rst equation we get for ,

    = y x: (3.48)

    If we substitute this into 3.39, we get

    E(~X~Y) = (y

    x)x+ (2x+

    2x); xy+ xy

    = xy 2x+ 2x+ 2x; (3.49)

    which gives

    =xy

    2x: (3.50)

    Using these expressions in the linear regression line leads to,

    E( ~Yj x; ) = y+xy

    2x(x x) = + x; (3.51)

    or if we chose to condition on ~Y instead,

    E(~X

    jy; ) = x+

    yx

    2y

    (y

    y) = + y: (3.52)

    We can now make use of the correlation coecient and the parameter in thelinear regression. The correlation coecient between ~Xand ~Y is dened as,

    = xyxy

    orxy = xy: (3.53)

    If we put this into the equations above we get,

    E( ~Yjx; ) = y+ yx

    (x x); (3.54)

    32 TIME SERIES MODELING - AN OVERVIEW

  • 7/25/2019 Lecture Notes on Time Series

    33/165

    E(~Xjy; ) = x+ xy

    (y y): (3.55)

    So, if the two variables are independent their covariance is zero, and the cor-relation is also zero. Therefore, the conditional mean of each variable does notdependent on the mean and variance of the other variable. The nal message

    is that a non-zero correlation, between two normal random variables, results inlinear relationship between them. With a multivariate model, with more than tworandom variables, things are more complex.

    THE LINEAR REGRESSION MODEL A GENERAL DESCRIPTION 33

  • 7/25/2019 Lecture Notes on Time Series

    34/165

    34 TIME SERIES MODELING - AN OVERVIEW

  • 7/25/2019 Lecture Notes on Time Series

    35/165

    4. THE METHOD OF MAXIMUMLIKELIHOOD

    There are two fundamental approaches to estimation in econometrics, the methodof moments and the maximum likelihood method. The dierence is that the mo-ments estimator deals with estimation withouta priorichoosing a specic densityfunction. The maximum likelihood estimator (MLE), on the other hand, requiresthat a specic density function is chosen from the beginning. Asymptotically thereis no dierence between the two approaches. The MLE is more general, and is thebasis for all the various tests applied in practical modeling. In this section we willfocus on MLE exclusively because of its central role.

    The principles of MLE were developed early, but for a long time it was con-sidered mainly as a theoretical device, with limited practical use. The progressin computer capacity has changed this. Many presentations of the MLE are too

    complex for students below the advanced graduate level. The aim of this chap-ter is to change this. The principle of ML is not dierent from OLS. The wayto learn MLE is to start with the simplest case, the estimation of the mean andthe variance of a single normal random variable. In the next step, it is easy toshow how the parameters of a simple linear regression model can be found, andtested, using the techniques of MLE. In the third step, we can analyse how theparameters of any density function. Finally, it is often interesting to study the bi-variate joint normal density function. This last exercise is good for understandingwhen certain variables can be treated as exogenous. The general idea is that afterviewing how a single random variable can be replaced by a function of randomvariables, it becomes obvious how a multivariate non-linear system of variablescan be estimated.

    Let us start with a single stochastic time series. The rst moment, or thesample mean, of the random process ~Xt with the observations (x1; x2;:::;xT) is

    found as x=PT

    t=1 xt=T. By using this technique we simply calculated a number

    that we can use to describe one characteristic of the process ~Xt. In the same waywe can calculate the second moment around the mean, etc. In the long run, andfor a stationary variable, we can use the central limit theorem (CLT) to argue that(x1; x2;:::;xT) has a normal distribution, which allows us to test for signicanceetc.

    4.1 MLE for a Univariate Process

    The MLE approach starts from a random variable ~Xt, and a sample ofT inde-pendent observations (x1; x2:::;xT). The joint density function is

    f(x1; x2;:::;xT;) = f(x;) =TY

    f(xt;) (4.1)

    To describe this process there are k parameters, = (1; 2;:::;k); so we writethe density function as,

    f(x;) (4.2)

    THE METHOD OF MAXIMUM LIKELIHOOD 35

  • 7/25/2019 Lecture Notes on Time Series

    36/165

    where x; indicates that it is the shape of the density, described by the pa-rameters which gives us the sample. If the density function describes a normaldistribution would consistent of two parameters the mean and the variance.

    Now, suppose that we know the functional form of the density function. If wealso have a sample of observations on ~Xt, we can ask the question which estimatesof would be the most likely to nd, given the functional form of the density and

    given the observations. Viewing the density in this way amounts to asking whichvalues of maximize the value of the density function.

    Formulating the estimation problem in this way leads to a restatement of thedensity function in terms of a likelihood function,

    L(;x); (4.3)

    where the parameters are seen as a function of the sample. It is often convenientto work with the log of the likelihood instead, leading to the log likelihood

    log L(;x) = l(;x) (4.4)

    What is left is to nd the maximum of this function with respect to the pa-rameters in . The maximum, if it exists is found by solving the system of k

    simultaneous equations,l(;x)

    i= 0; (4.5)

    for, which will be the log likelihood estimates , provided thatD2l(;x)is anegative denite matrix. In matrix form this expression is also know as the scorematrix, or the ecient score for , which can be written as,

    l(;x)

    =S(); (4.6)

    such that the matrix of the ecient score is zero at maximum.The matrix of the expected second order expressions is know as the information

    matrix

    E

    2

    l(;x)2

    = I(): (4.7)

    The information matrix plays an important role in demonstrating that MLestimators asymptotically attains the Cramer-Rao lower band, and in the deriva-tion of the so-called classical test statistics associated with the ML estimator. Itcan be shown, under quite general conditions, that the variances of the estimatedparameters from above ()are given by the inverse of the information matrix,

    var() = [I()]1: (4.8)

    So far we have not assigned any specic distribution to the density function.Let us assume a sample of T independent normal random variablesf ~Xtg. Thenormal distribution is particularly easy two work with since it only requires two

    parameters to describe it. We want to estimate the rst two moments, the meanand the variance 2, thus = (;2): The likelihood is,

    L(;x) =

    2 2T=2

    exp

    " 1

    22

    TXt=1

    (xt )2#

    : (4.9)

    Taking logs of this expression yields,

    l(;x) = (T =2)log 2 (T =2)log 2 (1=22)TX

    t=1

    (xt )2: (4.10)

    36 THE METHOD OF MAXIMUM LIKELIHOOD

  • 7/25/2019 Lecture Notes on Time Series

    37/165

    The partial derivative with respect to and 2 are,

    l

    =

    1

    2

    TXt=1

    (xt ); (4.11)

    and,

    l2

    = (T =22) + (1=24) TXt=1

    (xt )2: (4.12)

    If these equations are set to zero, the result is,

    TXt=1

    xt T = 0 (4.13)

    TXt=1

    (xt )2 T 2 = 0: (4.14)

    If this system is solved for and 2 we get the estimates of the mean and thevariance as1

    x = 1

    T

    TXt=1

    xt (4.15)

    2x = 1

    T

    TXt=1

    (xt x)2 = 1

    T

    TXt=1

    x2t 1

    T

    " TXt=1

    xt

    #2: (4.16)

    Do these estimates of and 2 really represent the maximum solution of thelikelihood function? To answer that question we have to look at the sign of theHessian of the log likelihood function, the second order conditions, evaluated atestimated values of the parameters in ;

    D2l(; x)=264

    2l

    2l

    22l

    22l

    22

    375=24 T

    2 1

    4P

    (xt ) 1

    4

    P(xt) T24

    P(xt )2

    35: (4.17)

    If we substitute from the solutions of the estimates of and 2, we get,

    E[D2l(; x)]=1

    264

    T2x

    0

    0 T24x

    375=I();

    (4.18)

    Since the variance, 2x is always positive we have a negative denite matrix,and a maximum value for the function at x and

    2x:

    It remains to investigate whether the estimates are unbiased. Therefore, re-place the observations, in the solutions for and 2x, by the random variable ~X

    and take expectation. The expected value of the mean is,

    E(x) = 1

    T

    TXt=1

    E(~X) = 1

    T

    TXt=1

    = ; (4.19)

    1 The solution is given by 1T

    PTt=1[xt ]

    2 = 1T

    PTt=1

    x2t +

    2 2xt

    = 1T

    PTt=1x

    2

    t + 1

    T

    PTt=1

    2 2 1T

    PTt=1xt

    = 1T

    PTt=1x

    2t +

    1

    TT 2 2 1

    TPT

    t=1xt

    = 1T

    PTt=1x

    2t +

    1

    T2

    hPTt=1xt

    i22 1

    T2

    PTt=1xt

    PTt=1xt

    = 1T

    PTt=1x

    2t

    1

    T2 [P

    xt]2

    MLE FOR A UNIVARIATE PROCESS 37

  • 7/25/2019 Lecture Notes on Time Series

    38/165

    which proves that x is an unbiased estimation of the mean. The calculationsfor the variance are bit more complex, but the idea is the same. The expectedvariance is,

    E[2x] = 1

    TE "

    T

    Xt=1

    ~X2t 1

    T T

    Xt=1

    ~Xt!2

    #=

    1

    TE

    "T E(~X2t)

    1

    TE

    TXt=1

    TXs=1

    ~Xt ~Xs

    #

    = 1

    T

    "T E(~X2t) E(~X2t)

    1

    TE

    TX

    t6=s

    TX ~Xt ~Xs

    !#

    = 1

    T

    (T 1)E(~X2t)

    1

    TT(T 1)[E(~Xt)]2

    =

    T 1T

    2 (4.20)

    Thus, 2 is not an unbiased estimate of2. The bias given by(T 1)=T, goesto zero as T! 1: This is a typical result from MLE, the mean is correct butthe variance is biased. To get an unbiased estimate if we need to correct the

    estimate in the following manner,

    s2 =T 1

    T 2 =

    T 1T

    1

    T E

    24 TX

    t=1

    ~Xt2

    TXt=1

    ~Xt

    !235 : (4.21)The correction involves multiplying the estimated variance with

    T1T

    :

    4.2 MLE for a Linear Combination of Variables

    We have derived the maximum likelihood estimates for a single independent nor-mal variable. How does this relate to a linear regression model? Earlier, whenwe discussed the moments of a variable, we showed how it was possible, as a gen-eral principle, to substitute a random variable with a function of the variable.The same reasoning applies here. Say that ~X is a function of two other randomvariables ~Y andZ. Assume the linear model

    yt= zt+ xt; (4.22)

    where ~Y is a random variable, with observationsfytg and zt is, for the timebeing, assumed to be a deterministic variable.(This is not a necessary assumption).Instead of using the symbol x, for observation on the random variable ~X; let ussetxt = t where t

    N ID(0, 2): Thus, we have formulated a linear regression

    model with a white noise residual. This linear equation can be rewritten as,

    t= yt zt (4.23)

    where the RHS is the function to be substituted with the single normal variablext used in the MLE example above. The algebra gets a bit more complicated butthe principal steps are the same.2 The unknown parameters in this case are and

    2 As a consequence of more complex algebra the computer algorithms for estimating the vari-ables will also get more complex. For the ordinary econometrician there are a lot of softwarepackages that cover most of the cases.

    38 THE METHOD OF MAXIMUM LIKELIHOOD

  • 7/25/2019 Lecture Notes on Time Series

    39/165

    2 . The log likelihood function will now look like,

    l(; 2 ; y; z) = (T =2)log 2 (T =2)log 2 (1=22)TX

    t=1

    (yt zt)2: (4.24)

    The last factor in this expression can be identied as the sum of squares func-

    tion,S(). In matrix form we have,

    S() =TX

    t=1

    (yt zt)2 = (Y Z)0(Y Z) (4.25)

    and

    l(; 2 ; y; z) = (T =2)log 2 (T =2) log 2 (1=22)(Y Z)0(Y Z) (4.26)

    Dierentiation ofS() with respect to yields

    S

    = 2Z0(Y Z); (4.27)

    which, if set to zero, solves to

    = (Z0Z)1(Z0Y) (4.28)

    Notice that the ML estimator of the linear regression model is identical to theOLS estimator.

    The variance estimate is,2 =

    0 =T; (4.29)

    which in contrast to the OLS estimate is biased.To obtain these estimates we did not have to make any direct assumptions

    about the distribution ofyt or zt:The necessary and sucient condition is thatytconditional onzt is normal, which means that ytzt = t should follow a normaldistribution. This is the reason why MLE is feasible even thoughyt might be adependent AR(p) process. In the AR(p) process the residual term is a independentnormal random variable. The MLE is given by substitution of the independentlydistributed normal variable with the conditional mean ofyt:

    The above results can be extended to a vector of normal random variables. Inthis case we have a multivariate normal distribution, where the density is

    D(X) = D(X1; X2;:::;XT); (4.30)

    The random variables ~X will have a mean vector and a covariance matrixP. The density function for the multivariate normal is,

    D(X) = [(2)n=2 jX

    jn=2]1 exp[(1=2)(X )0

    X1(X)] (4.31)

    which can be expressed in a compact form Xt N(