Structure of Limited Information Estimators as Regression Functions
The structure of the SEM estimators is discussed at length in Hendry (1976) and Hausman (1983), and Phillips (1983). In this section, we develop a perspective on the structure of limited information estimators which is particularly helpful in the finite sample analysis of these procedures.
Most of the limited information estimators we have discussed so far can be related to the regression moment matrices S, W, and A defined in (6.15):
$2sls = arg min P*WP*
Sols = arg min P* Ap*
Sliml = arg min (P*WP*/P*SP*).
If we partition S (and A and W similarly) as
S _ S11 S12
vS21 S22 у
we then have
Pols = A22A21
S2SLS = W22W21
and for the k-class estimator, with l = 1 – k,
S(k) = (W22 + lS22) 1(W21 + lS21).
Thus, we also have
Sliml = (W22 €S22) 1(W21 ^S21), (6.31)
where € = smallest eigenvalue of S-1W.
We can also give a similar characterization to the modified 2SLS estimator where the data matrix for the first stage regressor is H (constrained to contain X1) instead of X. Equivalently, this is the IV estimator using PHY1 as the instrument matrix for Y1 and its characterization is pM2SLS = F22F21, where F = Y1(PH – PX)Y1. Thus pOLS, S2SLS and pM2SLS are regression functions of moment matrices in Y1 – namely, A, W, and F, which differ among themselves only in their associated projection matrices.
In the scalar Gaussian case, A, W, and F are all proportional to noncentral Chi – squared variates with different degrees of freedom. In higher dimensional cases, they would have a so-called noncentral Wishart distribution which is indexed by the following parameters:
1. the order or size of the matrix (in our case G1 + 1),
2. the degrees of freedom or rank of the associated matrix Q (say q),
3. the common covariance matrix of the rows of Y (say Q),
4. the so-called means-sigma matrix, which is the generalization of the noncentrality parameter and is equal to (E(Y))’Q (E(Y)) = M, and
5. the rank of the means-sigma matrix, say m.
The Wishart distributions of A, W, and F differ in their degrees of freedom. The means-sigma matrices for A and W are identical; that for F takes a different expression but it has the same rank as the first – one less than the order of the matrices (see Mariano, 1977). Consequently, the probability density functions for A, W, and F are all of the same form. Because of this, we can say that OLS, 2SLS, and M2SLS are "distributionally equivalent" in the sense that the problem of deriving analytical results for one is of the same mathematical form as for the other two.
Distributional equivalence in the same vein exists between two-stage least squares in the just identified case and the instrumental variable estimator based on nonstochastic instruments for Y1 (see Mariano, 1977, 1982). The argument turns on the fact that 2SLS applied to a just identified equation can be interpreted as an IV estimator that uses the excluded exogenous variables as instruments for Y1.
Expressions (6.29), (6.30), and (6.31) of the E-class estimators in terms of the matrices S, A, and W also lead to a generalized method-of-moments (GMM) interpretation for these estimators. The moment conditions come from the asymptotic orthogonality of the instruments relative to the structural disturbances. Under standard large-sample asymptotics, these moment conditions are satisfied by 2SLS, LIML and the E-class with E converging to 1 in probability.
Moment conditions can also be derived by expressing в in terms of expectations of the matrices A, W, and S; see Bekker (1994). Under Bekker’s (1994) alternative asymptotics, where the number of instruments increases with sample size, LIML satisfies these moment conditions; but 2SLS and OLS do not. Consequently, under this alternative asymptotics LIML remains consistent while 2SLS and OLS are both inconsistent. This result provides a partial intuitive explanation for better LIML finite sample properties than 2SLS under conditions given in Section 7 of this chapter.