Artificial Regressions and HETEROsKEDASTiciTy
Covariance matrices and test statistics calculated via the GNR (1.7), or via artificial regressions such as (1.35) and (1.36), are not asymptotically valid when the assumption that the error terms are iid is violated. Consider a modified version of the nonlinear regression model (1.3), in which E(uuT) = Q, where Q is an n x n diagonal matrix with tth diagonal element ю2 Let V denote an n x n diagonal matrix with the squared residual й] as the tth diagonal element. It has been known since the work of White (1980) that the matrix
provides an estimator of var(S), which can be used in place of the usual estimator, s2(XTX)-1. Like the latter, this heteroskedasticity-consistent covariance matrix estimator, or HCCME, can be computed by means of an artificial regression. We will refer to this regression as the heteroskedasticity-robust Gauss-Newton regression, or HRGNR.
In order to derive the HRGNR, it is convenient to begin with a linear regression model y = Xp + u, and to consider the criterion function
Q(P) = ±(y – XP)TX(XTQX)-1XT(y – XP).
The negative of the gradient of this function with respect to P is
XTX(XTQX)-1XT(y – Xp), (1.38)
and its Hessian is the matrix
of which the inverse is the HCCME if we replace Q by V. Equating the gradient to zero just yields the OLS estimator, since XTX and XTQX are k x k nonsingular matrices.
Let V be an n x n diagonal matrix with fth diagonal element equal to rat; thus V2 = Q. Consider the n x k regressor matrix R defined by
R = VX(XTV2X)-1XTX = PVXV-1X, (1.40)
where PVX projects orthogonally on to the columns of VX. We have
RTR = XTX(XTQX)-1XTX, (1.41)
which is just the Hessian (1.39). Let U(P) be a diagonal matrix with tth diagonal element equal to yt – Xtp. Then, if we define R(P) as in (1.40) but with V replaced by U(p), we find that RTR is the HCCME (1.37).
In order to derive the regressand r(P), note that, for condition (1′) to be satisfied, we require
RT(p)r(p) = XTX(XTU2(P)X)-1XT(y – Xp); recall (1.38). Since the tth element of U(P) is yt – Xtp, this implies that
r(P) = U-1(P)(y – XP) = і.
In the general nonlinear case, X becomes X(P), and the HRGNR has the form
і = PU(p)X(p)U-1(P)X(P)b + residuals, (1.42)
where now the tth diagonal element of U(P) is yt – vt(P). When p = S, the vector of NLS estimates,
rTR = iTP0xU-1X
= iTU X(XTUUX)-1XTUU-1X = uTX(XTVX)-1XTX = 0,
because the NLS first-order conditions give XTu = 0. Thus condition (1) is satisfied for the nonlinear case. Condition (2) is satisfied by construction, as can be seen by putting hats on everything in (1.41).
For condition (3) to hold, regression (1.42) must satisfy the one-step property. We will only show that this property holds for linear models. Extending the argument to nonlinear models would be tedious but not difficult. In the linear case, evaluating (1.42) at an arbitrary T gives
b = (XTU-1PoxU-1X)-1XTU-1P0xi.
With a little algebra, it can be shown that this reduces to
b = (XTX)-1XTu = (XTX)-1XT(y – XT) = S – T, (1.44)
where S is the OLS estimator. It follows that the one-step estimator T + b is equal to S, as we wished to show. In the nonlinear case, of course, we obtain an asymptotic equality rather than an exact equality.
As with the ordinary GNR, the HRGNR is particularly useful for hypothesis testing. If we partition в as [в1 і p2] and wish to test the r zero restrictions p2 = 0, we need to run two versions of the regression and compute the difference between the two SSRs or ESSs. The two regressions are:
i = PCxU-1X1b1 + residuals, and (1.45)
i = PCxU-1X1b1 + PCxU-1X2b2 + residuals. (1.46)
It is important to note that the first regression is not the HRGNR for the restricted model, because it uses the matrix Pox rather than the matrix P^. In consequence, the regressand in (1.45) will not be orthogonal to the regressors. This is why we need to run two artificial regressions. We could compute an ordinary F-statistic instead of the difference between the SSRs from (1.45) and (1.46), but there would be no advantage to doing so, since the F-form of the test merely divides by a stochastic quantity that tends to 1 asymptotically.
The HRGNR appears to be new. The trick of multiplying X(P) by U-1(P) in order to obtain an HCCME by means of an OLS regression was used, in a different context, by Messer and White (1984). This trick does cause a problem in some cases. If any element on the diagonal of the matrix U(P) is equal to 0, the inverse of that element cannot be computed. Therefore, it is necessary to replace any such element by a small, positive number before computing U-1(P).
A different, and considerably more limited, type of heteroskedasticity-robust GNR, which is applicable only to hypothesis testing, was first proposed by Davidson and MacKinnon (1985b). It was later rediscovered by Wooldridge (1990, 1991) and extended to handle other cases, including regression models with error terms that have autocorrelation as well as heteroskedasticity of unknown form.
It is possible to construct a variety of artificial regressions that provide different covariance matrix estimators for regression models. From (1.43) and (1.44), it follows that any artificial regression with regressand
r(P) = U-1(P)(y – x(p))
R(P) = Pu(P)X(P)U-1(P)X(P)
satisfies properties (1) and (3) for the least-squares estimator, for any nonsingular matrix U(P). Thus any sandwich covariance matrix estimator can be computed by choosing U(P) appropriately; the estimator (1.37) is just one example. In fact, it is possible to develop artificial regressions that allow testing not only with a variety of different HCCMEs, but also with some sorts of heteroskedasticity and autocorrelation consistent (HAC) covariance matrix estimators. It is also a simple matter to use such estimators with modified versions of the artificial regression (1.35) used with models estimated by GMM.