Interpretations of the Two-Stage Least Squares Estimator
There are many interpretations of 2SLS. They are useful when we attempt to generalize 2SLS to situations in which some of the standard assumptions of the model are violated. Different interpretations often lead to different generalizations. Here we shall give four that we consider most useful.
Theil’s interpretation is most famous, and from it the name 2SLS is derived. In the first stage, the least squares method is applied to Eq. (7.3.2) and the least squares predictor Yi = PYj is obtained. In the second stage, Yj is substituted for Yj in the right-hand side of (7.3.1) and У! is regressed on Yj and X,. The least squares estimates of y, and fii thus obtained are 2SLS.
The two-stage least squares estimator can be interpreted as the asymptotically best instrumental variables estimator. This is the way Basmann (1957) motivated his estimator, which is equivalent to 2SLS. The instrumental variables estimator applied to (7.3.1) is commonly defined as (S’Z^S’y, for some matrix S of the same size as Z,. But, here, we shall define it more generally as
where Ps = S(S’ S)- ‘S’. The matrix S should have T rows, but the number of columns need not be the same as that of Z,. In addition, we assume that S satisfies the following three conditions:
(i) plim T^’S’S exists and is nonsingular,
(ii) plim T^S’u, = 0, and
(iii) plim r-lS’V, = 0.
Under these assumptions we obtain, in a manner analogous to the derivation of (7.3.9),
C-[? n„“]p-^X. p,xg:; J], (7.3.7,
Because A S C, we conclude by Theorem 17 of Appendix 1 that 2SLS is the asymptotically best in the class of the instrumental variables estimators defined above.7
For a third interpretation, we note that the projection matrix P used to define 2SLS in (7.3.4) purports to eliminate the stochastic part of Yj. If V, were observable, the elimination of the stochastic part of Y, could be more directly accomplished by another projection matrix MKl = I — V, (VI V,)~1 У’,. The asymptotic variance-covariance matrix of the resulting estimator is (a — and hence smaller than that of
2SLS. When we predict V, by MY,, where M = I — X(X’X)_1X’, and use it in place of MKl, we obtain 2SLS.
The last interpretation we shall give originates in the article by Anderson and Sawa (1973). Let the reduced form for y, be
у, = Хя, + v,. (7.3.18)
Then (7.3.1), (7.3.2), and (7.3.18) imply
iti =П1у +Jj/?, (7.3.19)
where J, = (X’X^’X’Xj. From (7.3.19) we obtain
л1=П1у + Jxp + (*,-*,)- (П, – П, )y, (7.3.20)
where я, and Пі are the least squares estimators of щ and П,, respectively. Then 2SLS can be interpreted as generalized least squares applied to (7.3.20). To see this, merely note that (7.3.20) is obtained by premultiplying (7.3.1) by (X’X^X’.