Properties of a. and j3

First, we obtain the means and the variances of the least squares estimators a and (3. For this purpose it is convenient to use the formulae (10.2.12) and (10.2.16) rather than (10.2.4) and (10.2.5).

Inserting (10.1.1) into (10.2.12) and using (10.2.17) yields

(10.2.19) P – (3 =

Since Eut = 0 and {x*’ are constants by our assumptions, we have from

(10.2.19) and Theorem 4.1.6,

(10.2.20) = p.

In other words, p is an unbiased estimator of p. Similarly, inserting

(10.1.1) into (10.2.16) and using (10.2.18) yields (10.2.21) d – a =

which implies

(10.2.22) Eol = a.

Using (10.2.19), the variance of P can be evaluated as follows:

(10.2.23) Up =——– ^ U(E x*ut) by Theorem 4.2.1  [Ц*?)2]2

S(xf)2

Similarly, we obtain from (10.2.21)

2

(10.2.24) Ud =————-

mn2

How good are the least squares estimators? Before we compare them with other estimators, let us see what we can learn from the means and the variances obtained above. First, their unbiasedness is clearly a desirable property. Next, note that the denominator of the expression for Up given in (10.2.23) is equal to T times the sample variance of xt. Therefore under reasonable circumstances we can expect Up to go to zero at about the same rate as the inverse of the sample size T. This is another desirable property. The variance of d has a similar property. A problem arises if xt
stays nearly constant for all t, for then both E(xf)2 and E(lf)2 are small. (Note that when we defined the bivariate regression model we excluded the possibility that xt is a constant for all t, since in that case the least squares estimators cannot be defined.) Intuitively speaking, we cannot clearly distinguish the effects of {xt} and the unity regressor on {yt when xt is nearly constant. The problem of large variances caused by a closeness of regressors is called the problem of multicollinearity.

For the sake of completeness we shall derive the covariance between a and P, although its significance for the desirability of the estimators will not be discussed until Chapter 12. E? xfu£?ut _ <T2Lxf 1 * Z(**)2E(1*)2 E(x*)2E(l*)2

Recall that in Chapter 7 we showed that we can define a variety of estimators with mean squared errors smaller than that of the sample mean for some values of the parameter to be estimated, but that the sample mean is best (in the sense of smallest mean squared error) among all the linear unbiased estimators. We can establish the same fact regarding the least squares estimators, which may be regarded as a natural generalization of the sample mean. (Note that the least squares estimator of the co­efficient in the regression of {yt} on the unity regressor is precisely the sample mean of (уг).)

Let us consider the estimation of p. The class of linear estimators of p is defined by Eqy, where ct are arbitrary constants. The class of linear unbiased estimators is defined by imposing the following condition on {ct}:

(10.2.26) ELctyt= P for all a and p.

Inserting (10.1.1) into the left-hand side of (10.2.26) and using Eut = 0, we see that the condition (10.2.26) is equivalent to the conditions

(10.2.27) Zct = 0 and

(10.2.28) T. ctxt = 1.

From (10.2.12) we can easily verify that p is a member of the class of linear unbiased estimators. We have

(10.2.29) V(Lcty^ = cr2Ec2

Comparing (10.2.29) and (10.2.23), we note that proving that (3 is the best linear unbiased estimator (BLUE) of (3 is equivalent to proving

(10.2.30) ^ Ecf for all {ct} satisfying (10.2.27) and (10.2.28).

44?  But (10.2.30) follows from the following identity similar to the one used in the proof of Theorem 7.2.12:

since Еед* = 1 using (10.2.27), (10.2.28), and (10.2.11). Note that

(10.2.30) follows from (10.2.31) because the left-hand side of (10.2.31) is the sum of squared terms and hence is nonnegative. Equation (10.2.31) also shows that equality holds in (10.2.30) if and only if ct = xf/E(xf) —in other words, the least squares estimator.

The proof of the best linear unbiasedness of a is similar and therefore left as an exercise.

Actually, we can prove a stronger result. Consider the estimation of an arbitrary linear combination of the parameters da + d2|3. Then djd + d2(3 is the best linear unbiased estimator of da + d2(3. The results obtained above can be derived as special cases of this general result by putting d = 0 and d2 = 1 for the estimation of (3, and putting d = 1 and d2 = 0 for the estimation of a. Because the proof of this general result is lengthy, and inasmuch as we shall present a much simpler proof using matrix analysis in Chapter 12, we give only a partial proof here.

Again, we define the class of linear estimators of da + d2(3 by Еед. The unbiasedness condition implies

(10.2.32) Eet = d and

(10.2.33) Eqx, = <h.

The variance ofLctyt is again given by (10.2.29). Define dlt d2x?

ВД)2 X(*?)2

Then the least squares estimator da + d2P can be written as hc? yt and its variance is given by a2Z(cf)2. The best linear unbiasedness of the least squares estimator follows from the identity

£(C,-cf)2 = Tc2-E(cf)2.

We omit the proof of this identity, except to note that (10.2.32) and

(10.2.33) imply Eqcf = Z(c* )2.

It is well to remember at this point that we can construct many biased and/or nonlinear estimators which have smaller mean squared errors than the least squares estimators for certain values of the parameters. Moreover, in certain situations some of these estimators may be more desirable than the least squares. Also, we should note that the proof of the best linear unbiasedness of the least squares estimator depends on our assumption that [ut] are serially uncorrelated with a constant variance.