LeastSquares Estimation Observe that
E[(Y – X0)T(Y – X0)] = E[(U + X(00 – 0))T(U + X0 – 0))]
= E[UTU] + 2(00 – 0)TE(XTE[UX])
+ (00 – 0 )T(E [XTX])(00 – 0)
= n • a2 + (00 – 0)T(E[XTX])(00 – 0).
(5.33)
Hence, it follows from (5.33) that[16]
в0 = argmin E[(Y – XQ)T(Y – XQ)] = (E[XTX])1 E[XTY]
в eRk
(5.34)
provided that the matrix E [XTX] is nonsingular. However, the nonsingularity of the distribution of Zj = (Yj, Xj)T guarantees that E [XTX] is nonsingular because it follows from Theorem 5.5 that the solution (5.34) is unique if YXX = Var(Xj) is nonsingular.
The expression (5.34) suggests estimating в0 by the ordinary[17] leastsquares! estimator
в = argmin(Y – XQ)T(Y – XQ) = (XTX)1XTY. (5.35)
в eRk
It follows easily from (5.32) and (5.35) that
в – во = (XTX)1XTU; (5.36)
hence, в is conditionally unbiased: E [вX] = в0 and therefore also unconditionally unbiased: E [в] = в0. More generally,
0X – N*[во, a2(XTX)1]. (5.37)
Of course, the unconditional distribution of в is not normal.
Note that the OLS estimator is not efficient because a2(E[XTX])1 is the CramerRao lower bound of an unbiased estimator of (5.37) and Var(0) = a2E[(XTX)1] = a2(E[XTX])1. However, the OLS estimator is the most efficient of all conditionally unbiased estimators в of (5.37) that are linear functions of Y. In other words, the OLS estimator is the best linear unbiased estimator (BLUE). This result is known as the GaussMarkov theorem:
Theorem 5.16: (GaussMarkov theorem) Let C(X) be ak x n matrix whose elements are Borelmeasurable functions of the random elements ofX, and let в = C(X) Y. IfE [в  X] = в0, then for some positive semidefinite k x k matrix D, Var[§lX] = a2C(X)C(X)T = a2(XTX)1 + D.
Proof: The conditional unbiasedness condition implies that C(X)X = Ik; hence, в = в0 + C(X)U, and thus Var(0X) = a2C(X)C(X)T. Now
D = a2[C(X)C(X)T – (XTX)1]
= a2[C(X)C(X)T – C(X)X(XTX)1XTC(X)T]
= a2C(X)[In – X(XTX)1 XT]C(X)T = a2C(X)MC(X)T,
for instance, where the second equality follows from the unbiasedness condition CX = Ik. The matrix


is idempotent; hence, its eigenvalues are either 1 or 0. Because all the eigenvalues are nonnegative, M is positive semidefinite and so is C(X)MC(X)T. Q. E.D.
Next, we need an estimator of the error variance a 2. Ifwe observed the errors Uj, then we could use the sample variance S2 = (1 /(n – 1))J2"=1(Uj – U)2 of the Uj’s as an unbiased estimator. This suggests using OLS residuals,
(5.39)
instead of the actual errors Uj in this sample variance. Taking into account that


(1/(n – 1))YTj=1 Щ. However, this estimator is not unbiased, but a minor correction will yield an unbiased estimator of a2, namely,


which is called the OLS estimator of a2. The unbiasedness of this estimator is a byproduct of the following more general result, which is related to the result of Theorem 5.13.
Theorem 5.17: ConditionalonXandwellas unconditionally, (n – k)S2/a2 ~ xlk; hence, E[S2] = a2.
Proof: Observe that
UTU – 2UTX(6 – 60) + (6 – 6 0)XTX( 6 – 60) UTU – UTX(XTX)1XTU = UTMU,
where the last two equalities follow from (5.36) and (5.38), respectively. Because the matrix M is idempotent with rank
rank(M) = trace(M) = trace(In) – trace(X(XTX) 1XT)
= trace(In) – trace ((XTX) 1XTX) = n — k,
it follows from Theorem 5.10 that, conditional onX, (5.42) divided by a2 has a x2—k distribution
n
£U2/a2X ~ xlk. (5.43)
j=1
It is left as an exercise to prove that (5.43) also implies that the unconditional distribution of (5.42) divided by a2 is x2—k:
n/
£U2/a2 ~ x2—k. (5.44)
j=1
Because the expectation of the x2—k distribution is n — k, it follows from (5.44) that the OLS estimator (5.41) of a2 is unbiased. Q. E.D.
Next, observe from (5.38) that XTM = O, and thus by Theorem 5.7 (XTX)—1XTU and UTMU are independent conditionally on X, that is,
P[XTU < x and UTMU < zX]
= P[XTU < x X] ■ P[UTMU < zX], V x є Rk, z > 0.
Consequently,
Theorem 5.18: Conditional onX, в and S2 are independent,
but unconditionally they can be dependent.
Theorems 5.17 and 5.18 yield two important corollaries, which I will state in the next theorem. These results play a key role in statistical testing.
S^cT( XT X)—1 c
(b) Let R be a given nonrandom m x k matrix with rank m < k. Then (в — e0)T RT(R(XTX)—1 RT)—1R (в — в0)
m S2
Proof of (5.45): It follows from (5.37) that cJ0 – в0)X ~ N[0, a 2cT(XTX)1c]; hence,
It follows now from Theorem 5.18 that, conditional on X, the random variable in (5.47) and S2 are independent; hence, it follows from Theorem 5.17 and the definition of the гdistribution that (5.44) is true, conditional on X and therefore also unconditionally.
Proof of (5.46): It follows from (5.37) that R(e – в0)X ~ Nm [0, a2R(XTX)1RT]; hence, it follows from Theorem 5.9 that
Again it follows from Theorem 5.18 that, conditional onX, the random variable in (5.48) and S2 are independent; hence, it follows from Theorem 5.17 and the definition of the ^distribution that (5.46) is true, conditional on X and therefore also unconditionally. Q. E.D.
Note that the results in Theorem 5.19 do not hinge on the assumption that the vector Xj in model (5.31) has a multivariate normal distribution. The only conditions that matter for the validity of Theorem 5.19 are that in (5.32), U  X ~ Nn(0, a2In) and P[0 < det(XTX) < to] = 1.
Leave a reply