ols estimators econometrics
TRANSCRIPT
-
8/3/2019 OLS estimators Econometrics
1/9
ECON581: Lecture NotesEconometrics IIby Jorge Rojas
Abstract
This is a summary containing the main ideas in the subject. This is not asummary of the lecture notes, this is a summary of ideas and basic concepts. Themathematical machinery is necessary, but the principles are much more important.
1 Linear AlgebraProperties of Transpose
1. (AT)T = A
2. (A + B)T = AT + BT
3. (AB)T = BTAT
4. (cA)T = cAT c R5. det(AT) = det(A)
6. a b = aTb =< a, b > (inner product)7. This is important: IfA has only real entries, then (ATA) is a positive-semidefinite
matrix.
8. (AT)1 = (A1)T
9. If A is a square matrix, then its eigenvalues are equal to the eigenvalues of itstranspose.
Notice that if A M
(nm), then AAT
is always symmetric.
Properties of the Inverse
1. (A1)1 = A
2. (kA)1 = 1k
A1 k R \ {0}3. (AT)1 = (A1)T
4. (AB)1 = B1A1
5. det(A1
) = [det(A)]1
Without Equality in Opportunities, Freedom is the privilege of a few, and Oppression the reality ofeveryone else.
1
-
8/3/2019 OLS estimators Econometrics
2/9
ECON581: Lecture NotesEconometrics II
by Jorge Rojas 1
Properties of the trace
1. Definition. tr(A) =n
i=1 aii
2. tr(A + B) = tr(A) + tr(B)
3. tr(cA) = c tr(A) c R4. tr(AB) = tr(BA)
5. Similarity invariant: tr(P1AP) = tr(A)
6. Invariant under cyclic permutations:
tr(ABCD) = tr(BCDA)
= tr(CDAB)
= tr(DABC)
7. tr(X Y) = tr(X) tr(Y) where is the tensor product, also known as Kroneckerproduct.
8. tr(XY) =
i,j Xij YjiThe Kronecker product is defined for matrices A M(mn) and B M(pq) as follows:
A B =
a11B a1nB...
. . ....
am1B amnB
mpnq
Properties of the Kronecker Product
1. (A B)1
= (A1
B1
)2. IfA M(mm) and B M(nn), then:
|A B| = |A|n|B|m(A B)T = AT BT
tr(A B) = tr(A)tr(B)3. (A B)(C D) = AC BD
Careful! it doesnt distribute with respect to the usual multiplication
Properties of Determinantsonly defined for A M
nn
1. det(aA) = an det(A) a R2. det(A) = (1)n det(A)3. det(AB) = det(A) det(B)4. det(In) = 1
5. det(A) = 1det(A1)
6. det(BAB1) = det(A) similarity transformation.
7. det(A) = det(AT)
8. det(A) = det(A) the bar represents complex conjugate.
University of Washington Page 2
-
8/3/2019 OLS estimators Econometrics
3/9
ECON581: Lecture NotesEconometrics II
by Jorge Rojas 2
Differentiation of Linear Transformations (matrices)
1. aTx
x= x
Ta
x= a
2. Axx
= xTAT
x= AT
3.xTA
x = A
4. xTAxx
= (A + AT)x
5. aTxxTb
x= abTx + baTx
Differentiation of traces
1. tr(AX)X
= tr(XA)X
= AT
2. tr(AXB)X
= tr(XBA)X
= (BA)T
3. tr(AXBXTC)
X= tr(XBX
TCA)X
= (BXTCA)T + (CAXB)
4. |X|X
= cofactor(X) = det(X) (X1)T
2 Probability Distributions
Here, we could say that, starts the summary for Econometrics ECON581.
Definition 1. Normal distribution: where is the mean and 2 is the variance.
f(x) = 1
2e
(x)222 x R
If the mean is zero and the variance is one, then we have the standard normal distribution
N(0, 1).
The normal distribution has no closed form solution for its cumulative density functionCDF.
Definition 2. Chi-square Distribution: We say that 2(r) has r degrees of freedom.
Zi iidN(0, 1)i = 1, . . . , r = A =r
i=1
Z2i 2(r)
E(A) = r and V(A) = 2r
Thus, the 2(r) is just a square sum of standard normal distributions. We use this
distribution to test the value of the variance of a population. For instance, H0: 2 = 5against H1: > 5
University of Washington Page 3
-
8/3/2019 OLS estimators Econometrics
4/9
ECON581: Lecture NotesEconometrics II
by Jorge Rojas 3
Definition 3. t-student Distribution: We say that t(r) has r degrees of freedom. Thet-distribution has fatter tails than the standard normal distribution.
Z N(0, 1) A 2(r) (Z and A are independent) = T =ZA/r
t(r)
E(T) = 0 and V(T) =r
r2
The t distribution is an appropriate ratio of a standard normal and a 2(r) randomvariables.
Definition 4. F Distribution: We say that F(r1, r2) has r1 degrees of freedom in thenumerator and r2 degrees of freedom in the denominator.
A1 2(r1) A2 2(r2) (A1 and A2 are independent) = F =A1/r1A2/r2
F(r1, r2)
We use the F distribution to test whether two variances are the same or not after astructural break. For instance, H0: 20 =
21 against H1:
20 >
21.
3 Probability Definitions
Definition 5. The expected value of a continuos random variable is given by:
E[X] =
xf(x)dx (1)
The notation
just means that is the domain of the relevant random variable.
Definition 6. The variance of a continuos random variable is given by:
V[X] = V ar[X] = E[(x )2] =
(x )2f(x)dx (2)
Definition 7. The covariance of two continuos random variables is given by:
C[X, Y] = Cov [X, Y] = E[XY] E[X]E[Y] (3)Notice that the covariance of a r.v. X with itself is its variance. In addition, if
two random variables are independent, then its covariance is zero. The reverse is notnecessarily true.
Some useful properties:
1. E(a + bX+ cY) = a + bE(X) + cE(Y)
2. V(a + bX = cY) = b2V(X) + c2V(Y) + 2bcCov(X, Y)
3. Cov(a1 + b1X+ c1Y, a2 + b2X+ c2Y) =
b1b2V(X) + c1c2V(Y) + (b1c2 + c1b2)Cov(X, Y)
4. IfZ = h(X, Y), then E(Z) = EX [EY|X(Z|X)] Law of iterated expectations
University of Washington Page 4
-
8/3/2019 OLS estimators Econometrics
5/9
ECON581: Lecture NotesEconometrics II
by Jorge Rojas 4
4 Econometrics
A random variable is a real-valued function defined over a Sample Space. The SampleSpace () is the set of all possible outcomes.
Before collecting the data (ex-ante) all our estimators are random variables. Once
we have realized the data(ex-post), we get a specific number for our estimators. Thesenumbers are what we called estimates.
Remark 1. A simple Econometric Model: yi = + ei i = 1, . . . , n. This is not aregression model, but is an econometric one.
In order to estimate we make the following assumptions:
1. E(ei) = 0 i2. V ar(ei) = E(e
2i ) =
2 i
3. Cov(ei, ej) = E(eiej) = 0 i = jIn a near future, we will further assume that the residual term follows a normal
distribution with = 0 and variance 2. This is not necessary for the estimation process,but we need to run some hypothesis tests.
What we are looking for is for a line that fits the data, minimising the distance betweenthe fitted line and the data. In other words, Ordinary Least Squares (OLS).
Minni=1
(yi )2 Min SS R i
ei2
The estimator is then given by = 1n
ni=1 yi = y
Definition 8. We say that an estimator is Unbiased if: E() = In other words, if after infinitely sampling we are able to achieve the true population
value.
For this particular estimator () is easy to see that is indeed unbiased and its varianceis V ar() = 1
n2, given the assumption that the draws are iid.
Note: Linear combination of normal distribution is a normal distribution.
Proposition 1. If
N(, 2
n), then Z =
/n N(0, 1)
Standard normal values: (z 1.96) = 0.025 and (z 1.64) = 0.05
Note:
1.
e2i2
2n2.
ei2
2= (n1)
2
2 2n1 We lose one degree of freedom here because we need to use
one datum to estimate .
3. When we do not know 2 our standardise variable is Z = 2
/n t(n
1)
University of Washington Page 5
-
8/3/2019 OLS estimators Econometrics
6/9
ECON581: Lecture NotesEconometrics II
by Jorge Rojas 5
Hypothesis TestingH0 is true H1 is true
Reject H0 Type I error OKReject H1 OK Type II error
Thus, we define the following probabilities:
P(Type I error) = P(Reject H0
|H0 is true) =
P(Type II error) = P(Fail to reject H0| H0 is false) = 1 and is the so-called power of the test.
Remark 2. Multiple Linear Regression (Population)
yi = x
i + ei i = 1, . . . , n vector notationASSUMPTIONS
E(ei) = 0 i (4)
E(e2
i ) = 2
iE(eiej) = 0 i = j
ei N(0, 2)X variables are non-stochastic.There is NO exact linear relationship among X variables.
If ei is not normal, we may apply the Central Limit Theorem (CLT). However, forthis we need to have a large sample size. How large is large enough? 30 (n K) is onenumber, but it will depend on the problem.
OLS estimator results from minimising the SSE(error sum of squares)
= (ni=1
xix
i)1
ni=1
xiyi (5)
The above estimator is useful if we are in Asymptopia.
In matrix notation we have:
y = X + e (6)
e
iid N(0, 2In)
X is non-stochastic
The OLS from the sample is:
= (X
X)1X
Y (7)
= + (X
X)1X
e
This mathematical form is useful to run analysis in the finite sample world.The OLS estimator is unbiased and its variance-covariance matrix is given by:
Cov() = E[( E())( E())] (8)= E[(X
X)1X
ee
X(X
X)1]
= 2(X
X)1
Thus, N(, 2(XX)1)
University of Washington Page 6
-
8/3/2019 OLS estimators Econometrics
7/9
ECON581: Lecture NotesEconometrics II
by Jorge Rojas 6
Definition 9. The matrix MX = In X(XX)1X is symmetric and idempotent, i.e.,MTX = MX and MX MX = MX
In general, we can have Mi = In Xi(XiXi)1Xi . Thus, MiXj is interpreted as theresiduals from regressing Xj on Xi.
Note:
The following properties are important for demonstrations:
1. IfA is a square matrix, then A = CC1
where is a diagonal matrix with the eigenvalues of A, and C is the matrix of theeigenvectors in column form.
2. IfA is symmetric, then C
C = CC
= In and hence A = CC
3. If A is symmetric and idempotent, then is a diagonal matrix with either eigen-values 1 or 0.
4. IfA = CC
, then rank(A)=rwhere r =
ni=1 i
Using this definition we get that
e
e = e
MXe
and hence E(e
e) = 2(n K)Theorem 1. Gauss-Markov Theorem:
In a linear regression model in which the errors have expectation zero and are uncor-
related and have equal variances, the best linear unbiased estimator (BLUE) of thecoefficients is given by the OLS estimator. Best means giving the lowest possible meansquared error of the estimate. Notice that the errors need not be normal, nor independentand identically distributed (only uncorrelated and homoscedastic).
The proof for this theorem is based on supposing an estimator = CY that is betterthan and finding the related contradiction.
Remark 3. Suppose that you have the model:
Y = X11 + X22 + e, e N(0, 2In)
then you can estimate 1 as:
1 = (X
1M2X1)1X
1M2Y
M2 = In X2(X2X2)1X
2
likewise, for 2 we have:
2 = (X
2M1X2)1X
2M1Y
M1 = In X1(X1X1)1X
1
University of Washington Page 7
-
8/3/2019 OLS estimators Econometrics
8/9
ECON581: Lecture NotesEconometrics II
by Jorge Rojas 7
4.1 Misspecification Cases.
Including an Irrelevant VariableTrue regression model:
Y = X11 + e
Estimated regression:
Y = X11 + X22 + e
The main result is that the OLS estimators are NOT efficient, however theyre stillunbiased.
1 = 1 + (X
1M2X1)1X
1M2e
E(1) = 1
V ar(1) = 2(X
1M2X1)
Thus, comparing the variances between the true estimator and the inefficient one, we geta matrix that is positive definite, and so we establish the claim.
1
V ar(1,true) 1
V ar(1,est)=
1
2X
1X2(X
2X2)1X
2X1
Omitting a Relevant Variable.True regression model:
Y = X11 + X22 + e
Estimated regression:Y = X11 + e
In this case, we get bias in the estimator, so we do not even mind to analyse the variance.
1 = 1 + (X
1X1)1X
1X22 + (X
1X1)1X
1e
E(1) = 1 + (X
1X1)1X
1X22
5 Hypothesis Testing (in detail).
Viva la Revolucion Libertaria!!!
University of Washington Page 8
-
8/3/2019 OLS estimators Econometrics
9/9
ECON581: Lecture NotesEconometrics II
by Jorge Rojas 8
6 One page Summary
Y = X + e
= (X
X)1X
Y
E() =
Cov() = 2(X
X)1
N(, 2(XX)1)
2 =1
n Ke
e =1
n Kni=1
e2i
E(2) = 2
V ar(2) =24
(n K)
e
e 2(n)(n K)2
2=
e
e
2 2(nK)
e N(0, 2In) e N(0, In)
MX = In X(XX)1Xe = MXY
Theorem 2. Gauss-Markov Theorem:
In a linear regression model in which the errors have expectation zero and are uncor-related and have equal variances, the best linear unbiased estimator (BLUE) of thecoefficients is given by the OLS estimator. Best means giving the lowest possible meansquared error of the estimate. Notice that the errors need not be normal, nor independentand identically distributed (only uncorrelated and homoscedastic).
University of Washington Page 9