To introduce the notion of instrumental variables, we start from (8.3): y = Хв + u, where consistent estimation was hampered by the correlation between X and u. If there are observations on variables, collected in Z, say, that correlate with the variables measured in x but do not correlate with u we have that the last term in Z’y/N = (Z’X/N)в + Z’u/N vanishes asymptotically and hence may lead to consistent estimation of в. This is the idea behind instrumental variables (IV) estimation, Z being the matrix of instrumental variables. Of all the methods used to obtain consistent estimators in models with measurement error or with endogenous regressors in general, this is undisputably the most popular one.
If Z is of the same order as X and Z’X/N converges to a finite, nonsingular matrix, the IV estimator bIV of в, defined as
bv = (Z’X)-1Z’y, (8.12)
is consistent. When Z has h > g columns, the IV estimator is
bv = (X PzX)-1X’Pzy,
where PZ = Z(Z’Z)-1Z’. For h = g this reduces to (8.12). Letting X = PZX, we have alternatively bIV = (X’ X)-1X’y, so it can be computed by OLS after transforming X, which comes down to computing the predicted value of X after regressing each of its columns on Z. Therefore, bIV is also called the two-stage least squares (2SLS) estimator.
Under some standard regularity conditions, bIV is asymptotically normally distributed (Bowden and Turkington, 1984, p. 26):
Vn(biv – P) ^ N(0, cI(I, Zx£zZ£zx) 1),
where оU = о2 + P’OP, XZX = plimN^„ Z’X/N, and XZZ = plimN^„ Z’Z/N. The
asymptotic covariance matrix can be consistently estimated by inserting Z’X/N for XZX, Z’Z/N for XZZ, and the consistent estimator 6U = (y – XbIV)'(y – XbIV)/N for оU. The residual variance о2 can be consistently estimated by 62e = y'(y – XbIV)/N, which differs from 6 U because, in contrast to the OLS case, the residuals y – XbIV are not perpendicular to X. The matrix Q cannot be estimated consistently unless additional assumptions are made. For example, if it is assumed that Q is diagonal, a consistent estimator of Q can be obtained.
The availability of instrumental variables is not only useful for consistent estimation in the presence of measurement error, it can also offer the scope for testing whether measurement error is present. An obvious testing strategy is to compare b from OLS with bIV and to see whether they differ significantly. Under the null hypothesis of no measurement error, the difference between the two will be purely random, but since they have different probability limits under the alternative hypothesis, a significant difference might be indicative of measurement error.
If normality of the disturbance term U is assumed, a test statistic for the null hypothesis of no measurement error is (Bowden and Turkington, 1984, p. 51)
(q* – q)/(N – 2g)’
assuming X and Z do not share columns, where
q = (b – biv)'((X’PzX)-1 – (X’X)-1)-1(b – bw) q* = (y – Xb)'(y – Xb)
and b is the OLS estimator (X’X)-1X’y. Under the null hypothesis, this test statistic follows an F-distribution with g and N – 2g degrees of freedom. If the
disturbances are not assumed to be normally distributed, a test statistic for no measurement error is q/&l, which under the null hypothesis converges in distribution to a chi-square variate with g degrees of freedom (Bowden and Turkington, 1984, p. 51).
Most of the above discussion has been asymptotic. It appears that in finite samples, instrumental variables estimators do not possess the desirable properties they have asymptotically, especially when the instruments correlate only weakly with the regressors (e. g. Nelson and Startz, 1990a, 1990b; Bound, Jaeger, and Baker, 1995; Staiger and Stock, 1997). This happens often in practice, because good instruments are frequently hard to find. Consider, for example, household income. If that is measured with error, typical instruments may be years of schooling or age of the head of the household. While these are obviously correlated with income, the relation will generally be relatively weak. Bekker (1994) proposed an alternative estimator that has better small sample properties than the standard IV estimator. His method of moments (MM) estimator is a generalization of the well known LIML estimator for simultaneous equations models. Its formula is given by
Ьмм – (X’P*X)-1X’P* y, (8.13)
where P* – PZ + XMM(IN – PZ) and XMM is the smallest solution X of the generalized eigenvalue equation
where S – (y, X)PZ(y, X) and S1 – (y, X)'(IN – PZ)(y, X). Equivalently, XMM is the minimum of
X s (y – XP)TZ( y – XP) (8.15)
(y – XP)'(In – Pz)(y – XP)
The solution vector p in (8.14) or (8.15) is equivalent to the estimator bMM in (8.13). Under the usual assumptions of N ^ <*> and g constant, XMM ^ 0 and, consequently, P* ^ PZ and the IV estimator and MM estimator are asymptotically equivalent. In finite samples, however, MM performs better than IV.
Other proposals for alternative estimators that should have better small sample properties than standard IV estimators can, for example, be found in Alonso – Borrego and Arellano (1999) and Angrist, Imbens, and Krueger (1999).
Consider the bivariate regression model with measurement errors (8.7). Further, assume that ф3 – E(^3n) Ф 0 and that tm, e n, and vn are independently distributed (similar assumptions can be formulated for the functional model). Now,
This illustrates that nonnormality may be exploited to obtain consistent estimators. It was first shown by Reiers0l (1950) that the model is identified under nonnormality. The precise condition, as stated at the beginning of Section 3, was derived by Bekker (1986).
There are many ways in which nonnormality can be used to obtain a consistent estimator of p. The estimator (8.16) will generally not be efficient and its small sample properties are usually not very good. Asymptotically more efficient estimators may be obtained by combining the equations for the covariances and several higher order moments and then using a nonlinear GLS procedure (see Section 4). If Е(£і) = 0, the third order moments do not give information about P and fourth or higher order moments may be used. An extensive discussion of the ways in which higher order moments can be used to obtain consistent estimators of P is given by Van Montfort, Mooijaart, and De Leeuw (1987).