# Constrained Least Squares Estimator as Best Linear Unbiased Estimator

That fi is the best linear unbiased estimator follows from the fact that y2 is the best linear unbiased estimator of y2 in (1.4.9); however, we also can prove it directly. Inserting (1.4.9) into (1.4.11) and using (1.4.8), we obtain

j» = ^+R(R’X’XR)-1R, X’u. (1.4.14)

Therefore, fi is unbiased and its variance-covariance matrix is given by

F()?) = (72R(R, X’XR)-,R’. (1.4.15)

We shall now define the class of linear estimators by fi* = С’ у — d where C’ is a ATX T matrix and d is a ЛГ-vector. This class is broader than the class of linear estimators considered in Section 1.2.5 because of the additive constants d. We did not include d previously because in the unconstrained model the unbi­asedness condition would ensure d = 0. Here, the unbiasedness condition E(C’y — d) = /fimpliesC’X = I + GQ’andd = Gc for some arbitrary ATX q matrix G. We have V(fi*) = er2C’C as in Eq. (1.2.30) and CLS is BLUE because of the identity

C’C — R(R’X’XR)_1R’ (1.4.16)

= [C’ – RfR’X’XRr’R’X’HC’ – R(R, X’XR)-,R’X’]’, where we have used C’X = I + GQ’ and R’Q = 0.

1.1.5 Stochastic Constraints

Suppose we add a stochastic error term on the right-hand side of (1.4.1), namely,

Q’fi = c + v, (1.4.17)

where v is a ^-vector of random variables such that Ev = 0 and Ew ’ = t2I. By making the constraints stochastic we have departed from the domain of clas­
sical statistics and entered that of Bayesian statistics, which treats the un­known parameters fi as random variables. Although we generally adopt the classical viewpoint in this book, we shall occasionally use Bayesian analysis whenever we believe it sheds light on the problem at hand.5

In terms of the parameterization of (1.4.7), the constraints (1.4.17) are equivalent to

y, = c + v. (1.4.18)

We shall first derive the posterior distribution of y, using a prior distribution over all the elements of y, because it is mathematically simpler to do so; and then we shall treat (1.4.18) as what obtains in the limit as the variances of the prior distribution for y2 go to infinity. We shall assume <x2 is known, for this assumption makes the algebra considerably simpler without changing the essentials. For a discussion of the case where a prior distribution is assumed on a2 as well, see Zellner (1971, p. 65) or Theil (1971, p. 670).

Let the prior density of у be

f(y) – (2ж)-^2|£1|-1/2exp [-(1/2Ху-/і)’П-Чу-/і)], (1.4.19)   where П is a known variance-covariance matrix. Thus, by Bayes’s rule, the posterior density of у given у is

= cx exp{-(l/2)[<7 2(y — Zy)'(y — Zy)

+ (У-*І)’ПГ‘(У-Л)]},

where does not depend on y. Rearranging the terms inside the bracket, we have

a~y – Zy)'(y ~ Zy) + (у-мУОг1(у-Ц) 0-4.21)

= y'(a~2Z’Z + £2-1)y – 2(<r~2y’Z + ^’П-‘)у + o~2y’y + ц’Ог’ц = (y – yy(a~2Z’Z + fl-‘Xy ~ У) ~ y'(<T-2Z’Z + Q-‘)y + a~2y’y +

where

у = (a~2Z’Z + Ог’УЧо-Ъ’у + Ог1ц). (1.4.22) Therefore the posterior distribution of у is y|y ~ N{y, [a-2Z’Z + a-1]-*),

and the Bayes estimator of у is the posterior mean у given in (1.4.22). Because у = АД the Bayes estimator of Д is given by

Д- A-^-^A’^X’XA-1 + О-Ч^ИА’Г’Х’у + Or1/*]

= (<r~2X’X + A’Q-lAri(<r-2X’j + A ‘tt-‘fl). (1.4.24)

We shall now specify ц and £2 so that they conform to the stochastic con­straints (1.4.18). This can be done by putting the first q elements of p equal to c (leaving the remaining K—q elements unspecified because their values do not matter in the limit we shall consider later), putting (1.4.25)

and then taking v2 to infinity (which expresses the assumption that nothing is a priori known about y2). Hence, in the limit we have (1.4.26)

Inserting (1.4.26) into (1.4.24) and writing the first q elements of /I as c, we finally obtain ji = (X’X + A2QQT4X’y + A2Qc),

where A2 = a2lx2.

We have obtained the estimator Д as a special case of the Bayes estimator,

but this estimator was originally proposed by Theil and Goldberger (1961) and was called the mixed estimator on heuristic grounds. In their heuristic ap­proach, Eqs. (1.1.4) and (1.4.17) are combined to yield a system of equations (1.4.28)

Note that the multiplication of the second part of the equations by A renders the combined error terms homoscedastic (that is, constant variance) so that (1.4.28) satisfies the assumptions of Model 1. Then Theil and Goldberger proposed the application of the least squares estimator to (1.4.28), an opera­tion that yields the same estimator as Дgiven in (1.4.27). An alternative way to interpret this estimator as a Bayes estimator is given in Theil (1971, p. 670).

There is an interesting connection between the Bayes estimator (1.4.27) and the constrained least squares estimator (1.4.11): The latter is obtained as the limit of the former, taking A2 to infinity. Note that this result is consistent with our intuition inasmuch as A2 —* oo is equivalent to t2 —* 0, an equivalency that
implies that the stochastic element disappears from the constraints (1.4.17), thereby reducing them to the nonstochastic constraints (1.4.1). We shall dem­onstrate this below. [Note that the limit of (1.4.27) as A2—»oo is not (QQ’)_1Qc because QQ’ is singular; rather, the limit is а К X К matrix with rank q < K. Define

– A-2Q’X’XR(R’X’XR)-1R’X’XQ

and F — A-2R’X’XR

– A-4R’X’XQ(A_2Q’X’XQ + Q’QQ’QrKj’X’XR.

From (1.4.27) and (1.4.29) we have

fi = A’B-1A(A-2X’y + Qc). (1.4.33)

Using (1.4.30), we have

lim A’B-1A(A~2X’y) = R(R’X’XRr, R’X, y (1.4.34)

A2—*oo

and

lim A’B-‘AQc = lim (Q, R)B > QnQC (1.4.35)

Я2—►oo A2“*00 L U J

= Q(Q’Q)"‘c – R(R’X’XR)-1R’X’XQ(Q’Q)-1c.

Thus we have proved

lim (1.4.36)

A2-*oo

where p is given in (1.4.11).