Instead of going into more complicated formulas for pdfs or cdfs or moments of estimators, we devote this section to a discussion of the practical implications of the finite sample results and alternative asymptotics – covering material that extends from the early works in the 1960s to the recent results on exact small sample properties of IV estimators and the behavior of IV estimators when instruments are weak. For finite sample results, most of the conclusions come from the study of the case of two included endogenous variables.
1. For the k-class estimator (nonstochastic k £ [0, 1]) bias is zero if and only if p, the correlation between the structural error and the endogenous regressor, is equal to zero. The direction of bias is the same for all k and it follows the direction of p. Negative correlation implies a downward bias; positive correlation implies an upward bias.
2. Absolute bias is an increasing function of the absolute value of p, a decreasing function of the concentration parameter p2 and a decreasing concave function of k. Thus, whenever both exist, OLS bias is always greater in absolute value than 2SLS bias. Of course, if the equation is just identified, then 2SLS bias does not exist while OLS bias is finite.
3. For two-stage least squares, absolute bias is an increasing function of the degree of over-identification. Since the 2SLS probability distribution depends on sample size only through the concentration parameter and since the value of the concentration parameter increases with additional observations, then 2SLS bias in absolute value decreases upon inclusion of more observations in the sample. The total effect of additional observations on OLS bias, on the other hand, is indeterminate. An increase in sample size produces a positive effect on absolute OLS bias (because of the increase in degrees of freedom) and a negative indirect effect through the increase in the concentration parameter.
4. The size of the OLS bias relative to 2SLS gets larger with higher p2, lower degree of overidentification, bigger sample size, and higher absolute p.
5. For the k-class, the optimal value of nonstochastic k over [0, 1] for minimizing mean squared error varies over a wide range according to slight changes in parameter values and the sample size.
6. For the whole k-class, k nonstochastic and in [0, 1], exact mean squared error is a decreasing function of the concentration parameter, an increasing function of the absolute value of p and an indefinite function of the degrees of freedom parameter (K2 – 1 or the degree of overidentification for 2SLS, T – K1 for OLS, h – K1 – 1 for M2SLS). Interpreting M2SLS as an IV method, keep in mind that h represents the number of instruments. Note also the ceteris paribus conditions here: all other parameters are kept fixed as a specific one, say the concentration parameter, changes. Furthermore, because the 2SLS distribution depends on sample size only through the concentration parameter (see item 3), it follows that the exact mean squared error for 2SLS is a nonincreasing function of sample size. This does not apply, however, to the other fc-class estimators; for them the net effect of increasing sample size is indefinite.
7. In terms of relative magnitudes of MSE, large values of p2 and large T favor 2SLS over OLS. One would expect this since the usual large-sample asymptotics would be taking effect and the dominant term would be the inconsistency in OLS. However, there are cases (small values of p and T) where OLS would dominate 2SLS even for large values of p2.
8. When the degree of overidentification gets large, the 2SLS and OLS distributions tend to be similar. This follows from the fact that the only difference in the distributions of 2SLS and OLS lies in the degrees of freedom parameter. These are the degree of overidentification for 2SLS and sample size less (K1 + 1) for OLS so that the smaller (T —K) is, the more similar the 2SLS and OLS distributions will be. Of course, there will be no perfect coincidence since sample size is strictly greater than K.
9. The OLS and 2SLS distributions are highly sensitive to p and the 2SLS distribution is considerably asymmetric while the OLS distribution is almost symmetric.
10. The extensive tabulations of the 2SLS distribution function in Anderson and Sawa (1979) provide considerable insight into the degree of asymmetry and skewness in the 2SLS distribution. Bias in the direction of p is quite pronounced. For some combinations of parameter values (such as K2 > 20, low concentration parameter and high numerical value of p), the probability is close to 1 that the 2SLS estimator will be on one side of the true value: e. g. less than the true value if p is negative. With regard to convergence to normality, when either p or K2 or both are large, the 2SLS distribution tends to normality quite slowly. In comparison with 2SLS, the LIML distribution is far more symmetric though more spread out and it approaches normality faster.
11. Up to terms of order T-1, the approximate LIML distribution (obtained from large sample asymptotic expansions) is median unbiased. For 2SLS, the median is в only if the equation is just identified or if p = 0. Up to order T~1/2, the approximate distribution functions for both 2SLS and LIML assign the same probability as the normal to an interval which is symmetric about p. Also, the asymptotic mean squared errors, up to T~1/2, reproduce that implied by the limiting normal distribution.
12. Anderson (1974) compares asymptotic mean squared errors of 2SLS and LIML up to order T-1 and finds that for a degree of overidentification (v) strictly less than 7, 2SLS would have a smaller asymptotic mean squared error than LIML. For v greater than or equal to 7 and for a2 = p2ra 2a 2/(юию2 — ю^) not too small, LIML will have the smaller AMSE. Calculation of probabilities of absolute deviations around p leads to the same conclusion: small p2 or little simultaneity favors 2SLS while a high degree of over-identification favors LIML. The condition for 2SLS to have the advantage over LIML in this case is
a2 < 2/(K2 — 1) = 2/v.
13. In dealing with the case of one explanatory endogenous variable and one instrument (the just-identified case), Nelson and Startz (1990) find that
1. the probability distribution of the IV estimator can be bimodal. Maddala and Jeong (1992) show that this is a consequence of near singularity of reduced form error covariance matrices but not necessarily of poor instruments.
2. The asymptotic distribution of the IV estimator is a poor approximation to the exact distribution when the instrument has low correlation with the regressor and when the number of observations is small.
14. As a further amplification of item 2, in dealing with the bias of instrumental variable estimators, Buse (1992) derives conditions under which Phillips’ (1980, 1983) observation would hold – that SIV would display more bias as the number of instruments increases. Buse shows that IV bias would increase or decrease with increased number of IV instruments depending on an inequality based on quadratic forms of incremental regression moment matrices of the right-hand side endogenous variables. In the case where there is only one right – hand side endogenous variable, the result simplifies to the conclusion that the estimated IV bias will increase with the number of excess instrumental variables "only if the proportional increase in the instruments is faster than the rate of increase in R2 measured relative to the fit of Y1 on X1." Thus, adding less important instrumental variables later will add little to R2 and increase IV bias. On the other hand, one could start with weak instruments and find that R2 rises dramatically (with a decline in IV bias) as important instruments are added. Consequently, whether or not there is an improvement in efficiency tradeoff between bias and variability in IV as more instruments are added depends on the IV selection sequence.
15. In Bekker’s (1994) asymptotic analysis where the number of instruments increases at the same rate as sample size, there is numerical evidence that approximations to distributions of IV estimators under this parameter sequence are more accurate than large sample approximations, even if the number of instruments is small. Confidence regions based on this alternative asymptotic analysis also produce more accurate coverage rates when compared to standard IV confidence regions. Under this alternative asymptotics, 2SLS becomes inconsistent while LIML remains consistent. The asymptotic Gaussian distribution of LIML depends on a, the limit of L/T, but the LIML asymptotic covariance matrix can be estimated by a strictly positive definite matrix without estimating a or specifying L – see Bekker (1994). Bekker’s numerical analysis shows that inference based on this estimated limiting distribution is more accurate than that based on large sample asymptotics (where a = 0).
16. From their weak-instrument asymptotics (number of instruments is fixed, coefficients of instruments in the first stage regression go to zero at the rate of T~1/2), Staiger and Stock (1997) conclude that
1. Conventional asymptotic results are invalid, even when sample size is large. The Gclass estimator is not consistent and has a nonstandard asymptotic
distribution. Similarly, Bound et al. (1995) find large inconsistencies in IV estimates when instruments are weak.
2. 2SLS and LIML are not asymptotically equivalent. 2SLS can be badly biased and can produce confidence intervals with severely distorted coverage rates. In light of this, nonstandard methods for interval estimation should be considered.
3. Estimator bias is less of a problem for LIML than 2SLS (when there are two included endogenous variables).
4. When doing IV estimation, the R2 or F-statistic in the first stage regression should be reported. Bound et al. (1995) also recommend this as a useful indicator of the quality of IV estimates.
Anderson, T. W. (1974). An asymptotic expansion of the distribution of the limited information maximum likelihood estimate of a coefficient in a simultaneous equation system. Journal of the American Statistical Association 69, 565-73.
Anderson, T. W. (1977). Asymptotic expansions of the distributions of estimates in simultaneous equations for alternative parameter sequences. Econometrica 45, 509-18.
Anderson, T. W., and T. Sawa (1979). Evaluation of the distribution function of the two – stage least squares estimate. Econometrica 47, 163-82.
Angrist, J. (1998). Estimating the labor market impact of voluntary military service using social security data on military applicants. Econometrica 66, 249-88.
Angrist, J., and A. Krueger (1992). The effect of age of school entry on educational attainment: An application of instrumental variables with moments from two samples. Journal of the American Statistical Association 87, 328-36.
Basmann, R. L. (1961). A note on the exact finite sample frequency functions of GCL estimators in two leading over-identified cases. Journal of the American Statistical Association 56, 619-36.
Bekker, P. A. (1994). Alternative approximations to the distribution of instrumental variable estimators. Econometrica 62, 657-81.
Bekker, P. A., and T. K. Dijkstra (1990). On the nature and number of constraints on the reduced form as implied by the structural form. Econometrica 58, 507-14.
Bound, J., D. A. Jaeger, and R. M. Baker (1995). Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. Journal of the American Statistical Association 90, 443-50.
Buse, A. (1992). The bias of instrumental variable estimators. Econometrica 60, 173-80.
Donald, S., and W. Newey (1999). Choosing the number of instruments. M. I.T. Working Paper, Department of Economics.
Hahn, J., and J. Hausman (1999). A new specification test for the validity of instrumental variables. M. I.T. Working Paper, Department of Economics.
Hausman, J. (1983). Specification and estimation of simultaneous equation models. The Handbook of Econometrics, Volume I pp. 393-448. North-Holland Publishing Company.
Hendry, D. F. (1976). The structure of simultaneous equations estimators. Journal of Econometrics 4, pp. 51-88.
Hsiao, C. (1983). Identification. The Handbook of Econometrics, Volume I pp. 223-83. North – Holland Publishing Company.
Kadane, J. (1971). Comparison of k-class estimators when the disturbances are small. Econometrica 39, 723 -37.
Kunitomo, N. (1980). Asymptotic expansions of the distributions of estimators in a linear functional relationship and simultaneous equations. Journal of the American Statistical Association 75, 693-700.
Maddala, G. S., and J. Jeong (1992). On the exact small sample distribution of the instrumental variable estimator. Econometrica 60, 181-3.
Mariano, R. S. (1982). Analytical small-sample distribution theory in econometrics: The simultaneous-equations case. International Economic Review 23, 503-34.
Mariano, R. S. (1975). Some large-concentration-parameter asymptotics for the fc-class estimators. Journal of Econometrics 3, 171-7.
Mariano, R. S. (1977). Finite-sample properties of instrumental variable estimators of structural coefficients. Econometrica 45, 487-96.
Morimune, K. (1978). Improving the limited information maximum likelihood estimator when the disturbances are small. Journal of the American Statistical Association 73, 867-71.
Morimune, K. (1983). Approximate distribution of fc-class estimators when the degree of overidentifiability is large compared with the sample size. Econometrica 51, 821-41.
Morimune, K., and N. Kunitomo (1980). Improving the maximum likelihood estimate in linear functional relationships for alternative parameter sequences. Journal of the American Statistical Association 75, 230-7.
Nagar, A. L. (1959). The bias and moment matrix of the general fc-class estimators of the parameters in structural equations. Econometrica 27, 575-95.
Nelson, C. R., and R. Startz (1990). Some further results on the exact small sample properties of the instrumental variable estimator. Econometrica 58, 967-76.
Phillips, P. C.B. (1980). The exact finite-sample density of instrumental variable estimators in an equation with n + 1 endogenous variables. Econometrica 48, 861-78.
Phillips, P. C.B. (1983). Exact small sample theory in the simultaneous equations model. The Handboofc of Econometrics, Volume I pp. 449-516. North-Holland Publishing Company.
Rothenberg, T. (1984). Approximating the distributions of econometric estimators and test statistics. Handboofc of Econometrics, Volume II pp. 881-935. Elsevier Science Publishers.
Sargan, J. D. (1974). The validity of Nagar’s expansion for the moments of econometric estimators. Econometrica 42, 169-76.
Sargan, J. D. (1976). Econometric estimators and the Edgeworth approximation. Econometrica 44, 421-48.
Staiger, D., and J. Stock (1997). Instrumental variables regression with weak instruments.
Econometrica 65, 557-86.
Wang, J., and E. Zivot (1998). Inference on structural parameters in instrumental variables regression with weak instruments. Econometrica 66, 1389-404.