# An Artificial Regression for Binary Response Models

For binary response models such as the logit and probit models, there exists a very simple artificial regression that can be derived as an extension of the Gauss – Newton regression. It was independently suggested by Engle (1984) and Davidson and MacKinnon (1984b).

The object of a binary response model is to predict the probability that the binary dependent variable, yt, is equal to 1 conditional on some information set Qt. A useful class of binary response models can be written as

E(y 11 Qt) = Pr(y t = 1) = F(ZtP). (1.51)

Here Z t is a row vector of explanatory variables that belong to Qt, в is the vector of parameters to be estimated, and F(x) is the differentiable cumulative distribu­tion function (CDF) of some scalar probability distribution. For the probit model, F(x) is the standard normal CDF. For the logit model, F(x) is the logistic function exp( x)

1 + exp(x)

The loglikelihood function for this class of binary response models is

*(P) = I ((1 – yt) log (1 – F(Zfp)) + yt log (F(ZtP))), (1.52)

t=1   If f(x) = F'(x) is the density corresponding for the CDF F(x), the first-order condi­tions for maximizing (1.52) are

where Zft is the tith component of Zt, ft = f(ZtS) and Ft = F(ZtS).

There is more than one way to derive the artificial regression that corresponds to the model (1.51). The easiest is to rewrite it in the form of the nonlinear regression model

yt = F(Zt P) + u, (1.54)

The error term ut here is evidently nonnormal and heteroskedastic. Because y t is like a Bernoulli trial with probability p given by F(Ztp), and the variance of a Bernoulli trial is p(1 – p), the variance of ut is

V(P) – F(ZfP)(1 – F(ZfP)). (1.55)

The ordinary GNR for (1.54) would be

Уt – F(ZtP) = f (ZtP)Ztb + residual,

but the ordinary GNR is not appropriate because of the heteroskedasticity of the ut. Multiplying both sides by the square root of the inverse of (1.55) yields the artificial regression

v-1/2(P)(yt – F(ZtP)) = v-1/2(P)f (Ztp)Ztb + residual. (1.56)  This regression has all the usual properties of artificial regressions. It can be seen from (1.53) that it satisfies condition (1′). Because a typical element of the informa­tion matrix corresponding to (1.52) is

it is not difficult to show that regression (1.56) satisfies condition (2). Finally, since (1.56) has the structure of a GNR, the arguments used in Section 3 show that it also satisfies condition (3), the one-step property.

As an artificial regression, (1.56) can be used for all the things that other artifi­cial regressions can be used for. In particular, when it is evaluated at restricted estimates U, the explained sum of squares is an LM test statistic for testing the restrictions. The normalization of the regressand by its standard error means that other test statistics, such as nR2 and the ordinary F-statistic for the coefficients on the regressors that correspond to the restricted parameters to be zero, are also asymptotically valid. However, they seem to have slightly poorer finite-sample properties than the ESS (Davidson and MacKinnon, 1984b). It is, of course, pos­sible to extend regression (1.56) in various ways. For example, it has been extended to tests of the functional form of F(x) by Thomas (1993) and to tests of ordered logit models by Murphy (1996).

8 Conclusion

In this chapter, we have introduced the concept of an artificial regression and discussed several examples. We have seen that artificial regressions can be use­ful for minimizing criterion functions, computing one-step estimates, calculating covariance matrix estimates, and computing test statistics. The last of these is probably the most common application. There is a close connection between the artificial regression for a given model and the asymptotic theory for that model. Therefore, as we saw in Section 6, artificial regressions can also be very useful for obtaining theoretical results.

Most of the artificial regressions we have discussed are quite well known. This is true of the Gauss-Newton regression discussed in Sections 3 and 4, the OPG regression discussed in Section 6, the double-length regression discussed in Section 9, and the regression for binary response models discussed in Section 10. However, the artificial regression for GMM estimation discussed in Section 7 does not appear to have been treated previously in published work, and we believe that the heteroskedasticity-robust GNR discussed in Section 8 is new.

References

Baltagi, B. (1999). Double length regressions for linear and log-linear regressions with AR(1) disturbances. Statistical Papers 4, 199-209.

Berndt, E. R., B. H. Hall, R. E. Hall, and J. A. Hausman (1974). Estimation and inference in nonlinear structural models. Annals of Economic and Social Measurement 3, 653-65.

Box, G. E.P., and D. R. Cox (1964). An analysis of transformations. Journal of the Royal Statistical Society, Series B 26, 211-52.

Chesher, A. (1983). The information matrix test: simplified calculation via a score test interpretation. Economics Letters 13, 45-8.

Chesher, A., and R. Spady (1991). Asymptotic expansions of the information matrix test statistic. Econometrica 59, 787-815.

Davidson, R., and J. G. MacKinnon (1981). Several tests for model specification in the presence of alternative hypotheses. Econometrica 49, 781-93.

Davidson, R., and J. G. MacKinnon (1984a). Model specification tests based on artificial linear regressions. International Economic Review 25, 485-502.

Davidson, R., and J. G. MacKinnon (1984b). Convenient Specification Tests for Logit and Probit Models. Journal of Econometrics 25, 241-62.

Davidson, R., and J. G. MacKinnon (1985a). Testing linear and loglinear regressions against Box-Cox alternatives. Canadian Journal of Economics 18, 499-517.

Davidson, R., and J. G. MacKinnon (1985b). Heteroskedasticity-robust tests in regression directions. Annales de l’JNSEE 59/60, 183-218.

Davidson, R., and J. G. MacKinnon (1988). Double-length artificial regressions. Oxford Bulletin of Economics and Statistics 50, 203-17.

Davidson, R., and J. G. MacKinnon (1990). Specification tests based on artificial regres­sions. Journal of the American Statistical Association 85, 220-7.

Davidson, R., and J. G. MacKinnon (1992). A new form of the information matrix test. Econometrica 60, 145-57.

Davidson, R., and J. G. MacKinnon (1993). Estimation and Inference in Econometrics. New York: Oxford University Press.

Davidson, R., and J. G. MacKinnon (1999). Bootstrap testing in nonlinear models. Interna­tional Economic Review 40, 487-508.

Delgado, M. A., and T. Stengos (1994). Semiparametric specification testing of non-nested econometric models. Review of Economic Studies 61, 291-303.

Engle, R. F. (1984). Wald, Likelihood Ratio and Lagrange Multiplier Tests in Econometrics. In Zvi Griliches and Michael D. Intriligator (eds.). Handbook of Econometrics, Vol. II, Amsterdam: North-Holland.

Godfrey, L. G. (1978). Testing against general autoregressive and moving average error models when the regressors include lagged dependent variables. Econometrica 46, 1293­301.

Godfrey, L. G., and M. R. Wickens (1981). Testing linear and log-linear regressions for functional form. Review of Economic Studies 48, 487-96.

Godfrey, L. G., M. McAleer, and C. R. McKenzie (1988). Variable addition and Lagrange Multiplier tests for linear and logarithmic regression models. Review of Economics and Statistics 70, 492-503.

Lancaster, T. (1984). The covariance matrix of the information matrix test. Econometrica 52, 1051-3.

MacKinnon, J. G., and L. Magee (1990). Transforming the dependent variable in regression models. International Economic Review 31, 315-39.

McCullough, B. D. (1999). Econometric software reliability: EViews, LIMDEP, SHAZAM, and TSP. Journal of Applied Econometrics 14, 191-202.

Messer, K., and H. White (1984). A note on computing the heteroskedasticity consistent covariance matrix using instrumental variable techniques. Oxford Bulletin of Economics and Statistics 46, 181-4.

Murphy, A. (1996). Simple LM tests of mis-specification for ordered logit models. Eco­nomics Letters 52, 137-41.

Orme, C. (1995). On the use of artificial regressions in certain microeconometric models. Econometric Theory 11, 290-305.

Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery (1992). Numerical Recipes in C, 2nd edn., Cambridge: Cambridge University Press.

Thomas, J. (1993). On testing the logistic assumption in binary dependent variable models. Empirical Economics 18, 381-92.

White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48, 817-38.

Wooldridge, J. M. (1990). A unified approach to robust, regression-based specification tests. Econometric Theory 6, 17-43.

Wooldridge, J. M. (1991). On the application of robust, regression-based diagnostics to models of conditional means and conditional variances. Journal of Econometrics 47, 5-46.