# Over-identiflcation and the 2SLS MinimandF    Constant-effects models with more instruments than endogenous regressors are said to be over-identified. Because there are more instruments than needed to identify the parameters of interest, these models impose a set of restrictions that can be evaluated as part of a process of specification testing. This process amounts to asking whether the line plotted in a VIV-type picture fits the relevant conditional means tightly enough given the precision with which the means are estimated. The details behind this useful idea are easiest to spell out using matrix notation and a traditional linear model.  denote the vector formed by concatenating the covariates and the single endogenous variable of interest. In the quarter-of-birth paper, for example, the covariates are year-of-birth and state-of-birth dummies, the instruments are quarter-of-birth dummies, and the endogenous variable is schooling. The coefficient vector is still Г = [a’, p]’, as in the previous subsection. The residuals for the causal model can be defined as a function of Г using

Vl(T) = Уi – r’Wi = Yi – [a’Xi + pSi].

This residual is assumed to be uncorrelated with the instrument vector, Zj. In other words, satisfies the orthogonality condition,

E [z^(r)]=0. (4.2.2)

In any sample, however, this equation will not hold exactly because there are more moment conditions than there are elements of Г. The sample analog of (4.2.2) is the sum over i,

ZiVi(F) = mN(Г). (4.2.3)

2SLS can be understood as a generalized method of moments (GMM) estimator that chooses a value for Г by making the sample analog of (4.2.2) as close to zero as possible.

By the central limit theorem, the sample moment vector V^m-N (Г) has an asymptotic covariance matrix equal to E[ZiZ^i (Г)2], a matrix we’ll call Л. Although somewhat intimidating at first blush, this is just a matrix of 4th moments, as in the sandwich formula used to construct robust standard errors, (3.1.7). As shown by Hansen (1982), the optimal GMM estimator based on (4.2.2) minimizes a quadratic form in the

sample moment vector, mN (g), where g is a candidate estimator of Г. The optimal weighting matrix in
the middle of the GMM quadratic form is Л-1. In practice, of course, Л, is unknown and must be estimated. A feasible version of the GMM procedure uses a consistent estimator of Л in the weighting matrix. Since the estimator using known and estimated Л have the same limiting distribution, we’ll ignore this distinction for now. The quadratic form to be minimized can therefore be written,

Jn (g) = NmN (д)’Л^1тм (g), (4.2.4)

where the N – term out front comes from normalization of the sample moments. As shown immediately below, when the residuals are conditionally homoskedastic, the minimizer of Jn (g) is the 2SLS estimator. Without homoskedasticity, the GMM estimator that minimizes (4.2.4) is White’s (1982) Two-Stage IV (a generalization of 2SLS) so that it makes sense to call Jn (g) the “2SLS minimand”.

Here are some of the details behind the GMM interpretation of 2SLS. Conditional homoskedasticity means that

E[ZlZ’lVl(T)2] = E [ZZ }al.

Substituting for Л-1 and using Z, Y and W to denote sample data vectors and matrices, the quadratic form to be minimized becomes    Jn(g) = (Na2vp1 x (y – Wg)’ZE[ZlZ’]-1Z'(y – Wg).

Jn(g) = (l/tf) x (Y – Wg)’Pz(Y – Wg),

where Pz = Z(Z’Z) 1Z. From here, we get the solution

g = Г 2SLS = [W’ Pz W]-1W ‘Pz Y.

Since the projection operator, Pz, produces fitted values, and Pz is an idempotent matrix, this can be seen to be the OLS estimator of the second-stage equation, (4.1.9), written in matrix notation. More generally, even without homoskedasticity we can obtain a feasible efficient 2SLS-type estimator by minimizing (4.2.4) and using a consistent estimator of E[Z^Z(g(g)2] to form Jn(g). Typically, we’d use the empirical fourth mo­ments, Z{Z[g2, where is the regular 2SLS residual computed without worrying about heteroskedasticity

(see, White, 1982, for distribution theory and other details). The over-identification test statistic is given by the minimized 2SLS minimand. Intuitively, this statistic tells us whether the sample moment vector, (g), is close enough to zero for the assumption that E[Zi^J = 0 to be plausible. In particular, under the null hypothesis that the residuals and instruments are indeed orthogonal, the minimized Jn (g) has a y2 (q — 1) distribution. We can therefore compare the empirical value of the 2SLS minimand with chi-square tables in a formal testing procedure for Ho : E[Zi^] = 0.

For reasons that will soon become apparent, we’re not often interested in over-identification per se. Our main interest is in the 2SLS minimand when the instruments are a full set of mutually exclusive dummy variables, as for the Wald estimators and grouped-data estimation strategies discussed above. In this important special case, the 2SLS becomes weighted least squares of a grouped equation like (4.1.16), while the 2SLS minimand is the relevant weighted sum of squares being minimized. To see this, note that projection on a full set of mutually exclusive dummy variables for an instrument that takes on J values produces an N x 1 vector of fitted values equal to the J conditional means at each value of the instrument (included covariates are counted as instruments), each one of these nj times, where nj is the group size and ^2 nj = N. The cross product matrix [Z’Z] in this case is a JXJ diagonal matrix with elements nj. Simplifying, we then have Jn(g) = (1/tf) x J] nj(yj ~ g’Wj)2;

where Wj is the sample mean of the rows of matrix W in group j. Thus, Jn (g) is the GLS weighted least squares minimand for estimation of the grouped regression: yjj on Wj. With a little bit more work (here we skip the details), we can similarly show that the efficient Two-Step IV procedure without homoskedasticity minimizes Jn (£) = >; ( n2 ) (/ – g’Wj)2;

where ct2 is the variance of ^ in group j. Estimation using (4.2.7) is feasible because we can estimate ct2 in a first-step, say, using inefficient-but-still-consistent 2SLS that ignores heteroskedasticity. Efficient two-step IV estimators are constructed in Angrist (1990, 1991).

The GLS structure of the 2SLS minimand allows us to see the over-identification test statistic for dummy instruments as a simple measure of the goodness of fit of the line connecting yjj and Wj. In other words, this is the chi-square goodness of fit statistic for the line in a VIV plot like figure 4.1.2. The chi-square degrees of freedom parameter is given by the difference between the number of values taken on by the instrument and the number of parameters being estimated.

Like the various paths leading to the 2SLS estimator, there are many roads to the test-statistic, (4.2.7), as well. Here are two further paths that are worth knowing. First, the test-statistic based on the general GMM minimand for IV, whether the instruments are group dummies or not, is the same as the over-

identification test statistic discussed in many widely-used econometric references on simultaneous equations models. For example, this statistic features in Hausman’s (1983) chapter on simultaneous equations in the Handbook of Econometrics, which also proposes a simple computational procedure: for homoskedastic models, the minimized 2SLS minimand is the sample size times the R2 from a regression of the 2SLS residuals on the instruments (and the included exogenous covariates). The formula for this is N ^ ,

where t) =Y—WГ2sls is the vector of 2SLS residuals.

Second, it’s worth emphasizing that the essence of over-identification can be said to be “more than one way to skin the same econometric cat.” In other words, given more than one instrument for the same causal relation, we might consider constructing simple IV estimators one at a time and comparing them. This comparison checks over-identification directly: If each just-identified estimator is consistent, the distance between them should be small relative to sampling variance, and should shrink as the sample size and hence the precision of these estimates increases. In fact, we might consider formally testing whether all possible just-identified estimators are the same. The resulting test statistic is said to generate a Wald test of this null, while the test-statistic based on the 2SLS minimand is said to be a Lagrange Multiplier (LM) test because it can be related to the score vector in a maximum likelihood version of the IV setup.

In the grouped-data version of IV, the Wald test amounts to a test of equality for the set of all possible linearly independent Wald estimators. If, for example, lottery numbers are divided into 4 groups based on various cohorts eligibility cutoffs (RSN 1-95, 96-125, 126-195, and the rest), then 3 linearly independent Wald estimators can be constructed. Alternatively, the efficient grouped-data estimator can be constructed by running GLS on these four conditional means. Four groups means there are 3 possible Wald estimators and 2 non-redundant equality restrictions on these three; hence, the relevant Wald statistic has 2 degrees of freedom. On the other hand, 4 groups means three instruments and a constant available to estimate a model with 2 parameters (the constant and the causal effect of military service). So the 2SLS minimand generates an over-identification test statistic with 4 — 2 = 2 degrees of freedom. And, in fact, provided you use the same method of estimating the weighting matrix in the relevant quadratic forms, these two test statistics not only test the same thing, they are numerically equivalent. This makes sense since we have already seen that 2SLS is the efficient linear combination of Wald estimators.

Finally, a caveat regarding over-identification tests in practice: In our experience, the “over-ID statistic” is often of little value in applied work. Because Jn (g) measures variance-normalized goodness of-fit, the over-ID test-statistic tends to be low when the underlying estimates are imprecise. Since IV estimates are very often imprecise, we cannot take much satisfaction from the fact that one estimate is within sampling variance of another even if the individual estimates appear precise enough to be informative. On the other

hand, in cases where the underlying IV estimates are quite precise, the fact that the over-ID statistic rejects need not point to an identification failure. Rather, this may be evidence of treatment effect heterogeneity, a possibility we discuss further below. On the conceptual side, however, an understanding of the anatomy of the 2SLS minimand is invaluable, for it once again highlights the important link between grouped data and IV. This link takes the mystery out of estimation and testing with instrumental variables and forces us to confront the raw moments that are the foundation for causal inference.