# Instrumental regressions

Consider the limited information (LI) structural regression model:

 y = Ye + Xayі + u = Z5 + u, (23.3) Y = Xana + X2n2 + V, (23.4)

where Y and Xj are n x m and n x k matrices which respectively contain the observations on the included endogenous and exogenous variables, Z = [Y, Xa], 5 = (P’, y1)’ and X2 refers to the excluded exogenous variables. If more than m variables are excluded from the structural equation, the system is said to be overidentified. The associated LI reduced form is:

п 1 = ПіР + y1/ п2 = П2р.    The necessary and sufficient condition for identification follows from the relation п2 = П2р. Indeed P is recoverable if and only if

rank(n2) = m. (23.7)

To test the general linear hypothesis R5 = r, where R is a full row rank q x (m + k) matrix, the well-known IV analog of the Wald test is frequently applied on grounds of computational ease. For instance, consider the two-stage least squares (2SLS) estimator

X = [Z’P(P P)-1P ‘Z]-1Z’P(P P)-1P ‘y, (23.8)

where P is the following matrix of instruments P = [X, X(X’ X)-1 X’Y]. Application of the Wald principle yields the following criterion

1

tw = (r – R8)'[R'(Z, P(P’P)-1P’Z)-1R](r – RX), (23.9)

s2

where s2 = П (y – ZX)'(y – ZX)’. Under usual regularity conditions and imposing identification, tw is distributed like a x2(q) variable, where q = rank(R).

Bartlett (1948) and Anderson and Rubin (1949, henceforth AR) suggested an exact test that can be applied only if the null takes the form p = p0. The idea behind the test is quite simple. Define y* = y – YP°. Under the null, the model can be written as y* = X1y1 + u. On the other hand, if the hypothesis is not true, y* will be a linear function of all the exogenous variables. Thus, the null may be assessed by the F-statistic for testing whether the coefficients of the regressors X2 "excluded" from (23.3) are zero in the regression of y* on all the exogenous variables, i. e. we simply test y2 = 0 in the extended linear regression y* = X1y1 + X2y2 + u.   We first consider a simple experiment based on the work of Nelson and Startz (1990a, 1990b) and Staiger and Stock (1997). The model considered is a special case of (23.3) with two endogenous variables (p = 2) and k = 1 exogenous vari­ables. The structural equation includes only the endogenous variable. The restric­tions tested are of the form H01 : P = P°. The sample sizes are set to n = 25, 100, 250. The exogenous regressors are independently drawn from the standard normal distribution. These are drawn only once. The errors are generated according to a multinormal distribution with mean zero and covariance matrix

 Table 23.1 IV-based Wald/Anderson-Rubin tests: empirical type I errors П2 n = 25 n = 100 n = 250 Wald AR Wald AR Wald AR 1 0.061 0.059 0.046 0.046 0.049 0.057 0.9 0.063 0.059 0.045 0.046 0.049 0.057 0.7 0.071 0.059 0.046 0.046 0.052 0.057 0.5 0.081 0.059 0.060 0.046 0.049 0.057 0.2 0.160 0.059 0.106 0.046 0.076 0.057 0.1 0.260 0.059 0.168 0.046 0.121 0.057 0.05 0.332 0.059 0.284 0.046 0.203 0.057 0.01 0.359 0.059 0.389 0.046 0.419 0.057

The other coefficients are:

в = p° = 0; П2 = 1, .9, .7, .5, .2, .1, .05, .01. (23.11)

In this case, the 2SLS-based test corresponds to the standard f-test (see Nelson and Startz (1990b) for the relevant formulae). 1,000 replications are performed. Table 23.1 reports probabilities of type I error [P(fype I error)] associated with the two-tailed 2SLS f-test for the significance of в and the corresponding Anderson – Rubin test. In this context, the identification condition reduces to П2 Ф 0; this condition can be tested using a standard F-test in the first stage regression.1 It is evident that IV-based Wald tests perform very poorly in terms of size control. Identification problems severely distort test sizes. While the evidence of size distortions is notable even in identified models, the problem is far more severe in near-unidentified situations. More importantly, increasing the sample size does not correct the problem. In this regard, Bound, Jaeger, and Baker (1995) report severe bias problems associated with IV-based estimators, despite very large sample sizes. In contrast, the Anderson-Rubin test, when available, is immune to such problems: the test is exact, in the sense that the null distribution of the AR criterion does not depend on the parameters controlling identification. In­deed, the AR test statistic follows an F(m, n – k) distribution, regardless of the identification status. The AR test has recently received renewed attention; see, for example, Dufour and Jasiak (1996) and Staiger and Stock (1997). Recall, however, that the test is not applicable unless the null sets the values of the coefficients of all the endogenous variables. On general linear structural restrictions, see Dufour and Khalaf (1998b).

Despite the recognition of the need for caution in the application of IV-based tests, standard econometric software packages typically implement IV-based Wald tests. In particular, the f-tests on individual parameters are routinely computed in the context of 2SLS or 3SLS procedures. Unfortunately, the Monte Carlo
experiments we have analyzed confirm that IV-based Wald tests realize compu­tational savings at the risk of very poor reliability.