# Least Squares as Best Linear Unbiased Estimator (BLUE)

The class of linear estimators of 0 can be defined as those estimators of the form С’ у for any TX К constant matrix C. We can further restrict the class by imposing the unbiasedness condition, namely,

EC’y=0 for all 0. (1.2.28)

Inserting (1.1.4) into (1.2.28), we obtain

C’X = I. (1.2.29)

Clearly, the LS estimator 0 is a member of this class. The following theorem proves that LS is best of all the linear unbiased estimators.

Theorem 1.2.1 (Gauss-Markov). Letf* = C’y where C isa TX К matrix of constants such that C’X = I. Then 0 is better than 0* if 0 Ф 0*.

Proof. Because 0* = 0 + C’ u because of (1.2.29), we have

V0* = EC’m’C (1.2.30)

= <r2C’C

= o2(X’X)-1 + er2[C’ – (X’X^X’HC’ – (Х’ХГ’ХТ.

The theorem follows immediately by noting that the second term of the last line of (1.2.30) is a nonnegative definite matrix.

We shall now give an alternative proof, which contains an interesting point of its own. The class of linear unbiased estimators can be defined alternatively as the class of estimators ofthe form (S’X)-1S’y, whereSisany TX К matrix of constants such that S’ X is nonsingular. When it is defined this way, we call it the class of instrumental variable estimators (abbreviated as IV) and call the column vectors of S instrumental variables. The variance-covariance matrix of IV easily can be shown to be <r2(S’ X)“ ‘S’ S(X’ S)-1. We get LS when we put S = X, and the optimality of LS can be proved as follows: Because I — S(S’S)-1S’ is nonnegative definite by Theorem 14(v) of Appendix 1, we have

X’X > X, S(S’S)-,S’X. (1.2.31)

Inverting both sides of (1.2.31) and using Theorem 17 of Appendix 1, we obtain the desired result:

(X’X)-1 < (S’X^S’StX’S)-1. (1.2.32)

In the preceding analysis we were first given the least squares estimator and then proceeded to prove that it is best of all the linear unbiased estimators. Suppose now that we knew nothing about the least squares estimator and had to find the value of C that minimizes C’C in the matrix sense (that is, in terms of a ranking of matrices based on the matrix inequality defined earlier) subject to the condition C’X = I. Unlike the problem of scalar minimization, cal­culus is of no direct use. In such a situation it is often useful to minimize the variance of a linear unbiased estimator of the scalar parameter p’/?, where p is an arbitrary Af-vector of known constants.

Let c’y be a linear estimator of p’fi. The unbiasedness condition implies X’c = p. Because Pc’у = <t2c’c, the problem mathematically is

Minimize с’ c subject to X’ с = p. (1.2.33)

Define

S=c’c — 2A'(X’c — p), (1.2.34)

where 2 Л is a AT-vector of Lagrange multipliers. Setting the derivative of 5 with respect to c equal to 0 yields

с = XA. (1.2.35)

Premultiplying both sides by X’ and using the constraint, we obtain

A = (X’X)-‘p. (1.2.36)

Inserting (1.2.36) into (1.2.35), we conclude that the best linear unbiased estimator ofp’jJis p'(X’X)-,X’y. We therefore have a constructive proof of Theorem 1.2.1.