Least-Squares Estimation Observe that
E[(Y – X0)T(Y – X0)] = E[(U + X(00 – 0))T(U + X0 – 0))]
= E[UTU] + 2(00 – 0)TE(XTE[U|X])
+ (00 – 0 )T(E [XTX])(00 – 0)
= n • a2 + (00 – 0)T(E[XTX])(00 – 0).
Hence, it follows from (5.33) that
в0 = argmin E[(Y – XQ)T(Y – XQ)] = (E[XTX])-1 E[XTY]
provided that the matrix E [XTX] is nonsingular. However, the nonsingularity of the distribution of Zj = (Yj, Xj)T guarantees that E [XTX] is nonsingular because it follows from Theorem 5.5 that the solution (5.34) is unique if YXX = Var(Xj) is nonsingular.
The expression (5.34) suggests estimating в0 by the ordinary least-squares! estimator
в = argmin(Y – XQ)T(Y – XQ) = (XTX)-1XTY. (5.35)
It follows easily from (5.32) and (5.35) that
в – во = (XTX)-1XTU; (5.36)
hence, в is conditionally unbiased: E [в|X] = в0 and therefore also unconditionally unbiased: E [в] = в0. More generally,
0|X – N*[во, a2(XTX)-1]. (5.37)
Of course, the unconditional distribution of в is not normal.
Note that the OLS estimator is not efficient because a2(E[XTX])-1 is the Cramer-Rao lower bound of an unbiased estimator of (5.37) and Var(0) = a2E[(XTX)-1] = a2(E[XTX])-1. However, the OLS estimator is the most efficient of all conditionally unbiased estimators в of (5.37) that are linear functions of Y. In other words, the OLS estimator is the best linear unbiased estimator (BLUE). This result is known as the Gauss-Markov theorem:
Theorem 5.16: (Gauss-Markov theorem) Let C(X) be ak x n matrix whose elements are Borel-measurable functions of the random elements ofX, and let в = C(X) Y. IfE [в | X] = в0, then for some positive semidefinite k x k matrix D, Var[§lX] = a2C(X)C(X)T = a2(XTX)-1 + D.
Proof: The conditional unbiasedness condition implies that C(X)X = Ik; hence, в = в0 + C(X)U, and thus Var(0|X) = a2C(X)C(X)T. Now
D = a2[C(X)C(X)T – (XTX)-1]
= a2[C(X)C(X)T – C(X)X(XTX)-1XTC(X)T]
= a2C(X)[In – X(XTX)-1 XT]C(X)T = a2C(X)MC(X)T,
for instance, where the second equality follows from the unbiasedness condition CX = Ik. The matrix
is idempotent; hence, its eigenvalues are either 1 or 0. Because all the eigenvalues are nonnegative, M is positive semidefinite and so is C(X)MC(X)T. Q. E.D.
Next, we need an estimator of the error variance a 2. Ifwe observed the errors Uj, then we could use the sample variance S2 = (1 /(n – 1))J2"=1(Uj – U)2 of the Uj’s as an unbiased estimator. This suggests using OLS residuals,
instead of the actual errors Uj in this sample variance. Taking into account that
(1/(n – 1))YTj=1 Щ. However, this estimator is not unbiased, but a minor correction will yield an unbiased estimator of a2, namely,
which is called the OLS estimator of a2. The unbiasedness of this estimator is a by-product of the following more general result, which is related to the result of Theorem 5.13.
Theorem 5.17: ConditionalonXandwellas unconditionally, (n – k)S2/a2 ~ xl-k; hence, E[S2] = a2.
Proof: Observe that
UTU – 2UTX(6 – 60) + (6 – 6 0)XTX( 6 – 60) UTU – UTX(XTX)-1XTU = UTMU,
where the last two equalities follow from (5.36) and (5.38), respectively. Because the matrix M is idempotent with rank
rank(M) = trace(M) = trace(In) – trace(X(XTX) 1XT)
= trace(In) – trace ((XTX) 1XTX) = n — k,
it follows from Theorem 5.10 that, conditional onX, (5.42) divided by a2 has a x2—k distribution
£U2/a2|X ~ xlk. (5.43)
It is left as an exercise to prove that (5.43) also implies that the unconditional distribution of (5.42) divided by a2 is x2—k:
£U2/a2 ~ x2—k. (5.44)
Because the expectation of the x2—k distribution is n — k, it follows from (5.44) that the OLS estimator (5.41) of a2 is unbiased. Q. E.D.
Next, observe from (5.38) that XTM = O, and thus by Theorem 5.7 (XTX)—1XTU and UTMU are independent conditionally on X, that is,
P[XTU < x and UTMU < z|X]
= P[XTU < x |X] ■ P[UTMU < z|X], V x є Rk, z > 0.
Theorem 5.18: Conditional onX, в and S2 are independent,
but unconditionally they can be dependent.
Theorems 5.17 and 5.18 yield two important corollaries, which I will state in the next theorem. These results play a key role in statistical testing.
S^cT( XT X)—1 c
(b) Let R be a given nonrandom m x k matrix with rank m < k. Then (в — e0)T RT(R(XTX)—1 RT)—1R (в — в0)
Proof of (5.45): It follows from (5.37) that cJ0 – в0)|X ~ N[0, a 2cT(XTX)-1c]; hence,
It follows now from Theorem 5.18 that, conditional on X, the random variable in (5.47) and S2 are independent; hence, it follows from Theorem 5.17 and the definition of the г-distribution that (5.44) is true, conditional on X and therefore also unconditionally.
Proof of (5.46): It follows from (5.37) that R(e – в0)|X ~ Nm [0, a2R(XTX)-1RT]; hence, it follows from Theorem 5.9 that
Again it follows from Theorem 5.18 that, conditional onX, the random variable in (5.48) and S2 are independent; hence, it follows from Theorem 5.17 and the definition of the ^-distribution that (5.46) is true, conditional on X and therefore also unconditionally. Q. E.D.
Note that the results in Theorem 5.19 do not hinge on the assumption that the vector Xj in model (5.31) has a multivariate normal distribution. The only conditions that matter for the validity of Theorem 5.19 are that in (5.32), U | X ~ Nn(0, a2In) and P[0 < det(XTX) < to] = 1.