# Specification Tests

Specification tests are an important part of model specification in econometrics. In this section, we only study a few of these diagnostic tests. For an excellent summary on this topic, see Wooldridge (2001).

(1) Ramsey’s (1969) RESET (Regression Specification Error Test)

Ramsey suggests testing the specification of the linear regression model yt = X[fi + ut by augmenting it with a set of regressors Zt so that the augmented model is

yt = XtP + ZtY + ut (8.48)

If the Zt’s are available then the specification test would reduce to the F-test for Ho; y = 0. The crucial issue is the choice of Zt variables. This depends upon the true functional form under the alternative, which is usually unknown. However, this can be often well approximated by higher powers of the initial regressors, as in the case where the true form is quadratic or cubic. Alternatively, one might approximate it with higher moments of yt = X’teols. The popular Ramsey RESET test is carried out as follows:

(1) Regress yt on Xt and get yt.

(2) Regress yt on Xt, y2, yf and y)4 and test that the coefficients of all the powers of yt are

zero. This is an F3,T-k-3 under the null.

Note that yt is not included among the regressors because it would be perfectly multicollinear with Xt. Different choices of Zt’s may result in more powerful tests when H0 is not true. Thursby and Schmidt (1977) carried out an extensive Monte Carlo and concluded that the test based on Zt = [Xt2,Xf, X)4] seems to be generally the best choice.

(2) Utts’ (1982) Rainbow Test The basic idea behind the Rainbow test is that even when the true relationship is nonlinear, a good linear fit can still be obtained over subsets of the sample. The test therefore rejects the null hypothesis of linearity whenever the overall fit is markedly inferior to the fit over a properly selected sub-sample of the data, see Figure 8.3.    Let Єє be the OLS residuals sum of squares from all available n observations and let e! є be the OLS residual sum of squares from the middle half of the observations (T/2). Then

Under H0; E(e’e/(T-k)) = a2 = E Єе/ (j — k)], while in general under HA; E(є’є/(Т—k)) > E Єе/ (j — k)] > a2. The RRSS is Єє because all the observations are forced to fit the straight line, whereas the URSS is e’e because only a part of the observations are forced to fit a straight line. The crucial issue of the Rainbow test is the proper choice of the subsample (the middle T/2 observations in case of one regressor). This affects the power of the test and not the distribution of the test statistic under the null. Utts (1982) recommends points close to X, since an incorrect linear fit will in general not be as far off there as it is in the outer region. Closeness to X is measured by the magnitude of the corresponding diagonal elements of PX. Close points are those with low leverage ha, see section 8.1. The optimal size of the subset depends upon the alternative. Utts recommends about 1/2 of the data points in order to obtain some robustness to outliers. The F-test in (8.49) looks like a Chow test, but differs in the selection of the sub-sample. For example, using the post-sample predictive Chow test, the data are arranged according to time and the first T observations are selected. The Rainbow test arranges the data according to their distance from X and selects the first T/2 of them. 3 вгв = (X X)-1X ‘y (8.51)

with var(/3FD) = a2(X’X)-1 X’DD’X(X’X)-1 since var(U) = a2(DD)’ and

 2 —1 0 .. . 0 0 — 1 2 —1 .. . 0 0 DD’ = 0 0 0 .. . 2 —1 0 0 0 .. . —1 2

The differencing test is based on         q = Pfd — Pols with V (q) = a2[V (Pfd) — V (Pols )] A consistent estimate of V(V is where v2 is a consistent estimate of a2. Therefore, A = TV[V(q)]-1q ~ Xk under H0

where k is the number of slope parameters if V(V) is nonsingular. V(V) could be singular, in which case we use a generalized inverse V-(V) of V(V) and in this case is distributed as x2 with degrees of freedom equal to the rank(F(V)). This is a special case of the general Hausman (1978) test which will be studied extensively in Chapter 11.

Davidson, Godfrey, and MacKinnon (1985) show that, like the Hausman test, the PSW test is equivalent to a much simpler omitted variables test, the omitted variables being the sum of the lagged and one-period ahead values of the regressors.

Thus if the regression equation we are considering is  it = filXlt + в2х2і + Ut

the PSW test involves estimating the expanded regression equation it = PlXlt + @2x2t + Y1 Zlt + Y2z2t + Ut

where zit = x,t+i + xi, t-i and z2t = x2,t+i + x2,t-i and testing the hypothesis Yi = Y2 = 0 by the usual F-test.

If there are lagged dependent variables in the equation, the test needs a minor modification. Suppose that the model is

yt = PiVt-i + в 2Xt + ut (8.57)

Now the omitted variables would be defined as zit = yt + yt-2 and z2t = xt+i + xt-i. There is no problem with z2t but zit would be correlated with the error term ut because of the presence of yt in it. The solution would be simply to transfer it to the left hand side and write the expanded regression equation in (8.56) as

(1 – Yi)yt = вiyt-i + e2xt + Yiyt-2 + Y2z2t + ut (8.58)

This equation can be written as

yt = eyt-i + в *2xt + Y *yt-2 + Y2z2t + u*t (8.59)

where all the starred parameters are the corresponding unstarred ones divided by (1 – Yi).

The PSW now tests the hypothesis y2 = Yt = 0. Thus, in the case where the model involves the lagged dependent variable yt-i as an explanatory variable, the only modification needed is that we should use yt-2 as the omitted variable, not (yt + yt-2). Note that it is only yt-i that creates a problem, not higher-order lags of yt, like yt-2,yt-3, and so on. For yt-2, the corresponding zt will be obtained by adding yt-i to yt-3. This zt is not correlated with ut as long as the disturbances are not serially correlated.

(4) Tests for Non-nested Hypothesis

Consider the following two competing non-nested models:

Hi; y = Xiei + ei (8.60)

H2; y = Х2в2 + e2 (8.61)

These are non-nested because the explanatory variables under one model are not a subset of the other model even though X1 and X2 may share some common variables. In order to test H1 versus H2, Cox (1961) modified the LR-test to allow for the non-nested case. The idea behind Cox’s approach is to consider to what extent Model I under Hi, is capable of predicting the performance of Model II, under H2.

Alternatively, one can artificially nest the 2 models

Ha; y = Xiвi + Х2в2 + Є3 (8.62)

where X22 excludes from X2 the common variables with X1. A test for H1 is simply the F-test for Ho; в2 = 0.

Criticism: This tests Hi versus H3 which is a (Hybrid) of Hi and H2 and not Hi versus H2. Davidson and MacKinnon (1981) proposed (testing a = 0) in the linear combination of Hi and H2:

y = (1 – a)Xlвl + аХ2в2 + e (8.63)

where a is an unknown scalar. Since a is not identified, we replace в2 by в2,ols = (X2 X2/T )-i (X2y/T) the regression coefficient estimate obtained from running y on X2 under H2, i. e., (1)

Run y on X2 get y2 = X2@2,ols; (2) Run y on X1 and y2 and test that the coefficient of y2 is zero. This is known as the J-test and this is asymptotically N(0,1) under H1.

Fisher and McAleer (1981) suggested a modification of the J – test known as the JA test.

Under Hi; plim/?2 = plim(X2X2/T)_1plim(X2XjT)@1 + 0 (8.64)

Therefore, they propose replacing f32 by /32 = (X2X2)_1(X2X1)[31,OLS where f31,OLS = (XiX1)_1 Xiy. The steps for the JA-test are as follows:

1. Run y on X1 get 31 = X13hoLS■

2. Run 31 on X2 get 32 = X2XX2)_1X231.

3. Run y on X1 and 32 and test that the coefficient of 32 is zero. This is the simple f-statistic on the coefficient of 32. The J and JA tests are asymptotically equivalent.

Criticism: Note the asymmetry of H1 and H2. Therefore one should reverse the role of these hypotheses and test again.

In this case one can get the four scenarios depicted in Table 8.6. In case both hypotheses are not rejected, the data are not rich enough to discriminate between the two hypotheses. In case both hypotheses are rejected neither model is useful in explaining the variation in y. In case one hypothesis is rejected while the other is not, one should remember that the non-rejected hypothesis may still be brought down by another challenger hypothesis.

Small Sample Properties: (i) The J – test tends to reject the null more frequently than it should. Also, the JA test has relatively low power when K1, the number of parameters in H1 is larger than K2, the number of parameters in H2. Therefore, one should use the JA test when K1 is about the same size as K2, i. e., the same number of non-overlapping variables. (ii) If both H1 and H2 are false, these tests are inferior to the standard diagnostic tests. In practice, use higher significance levels for the J-test, and supplement it with the artificially nested F-test and standard diagnostic tests. a

Note: J and JA tests are one degree of freedom tests, whereas the artificially nested F-test is not.

For a recent summary of non-nested hypothesis testing, see Pesaran and Weeks (2001). Exam­ples of non-nested hypothesis encountered in empirical economic research include linear versus log-linear models, see section 8.5. Also, logit versus probit models in discrete choice, see Chap­ter 13 and exponential versus Weibull distributions in the analysis of duration data. In the

logit versus probit specification, the set of regressors is most likely to be the same. It is only the form of the distribution functions that separate the two models. Pesaran and Weeks (2001, p. 287) emphasize the differences between hypothesis testing and model selection:

The model selection process treats all models under consideration symmetrically, while hypothesis testing attributes a different status to the null and to the alternative hypotheses and by design treats the models asymmetrically. Model selection always ends in a definite outcome, namely one of the models under consideration is selected for use in decision making. Hypothesis testing on the other hand asks whether there is any statistically significant evidence (in the Neyman-Pearson sense) of departure from the null hypothesis in the direction of one or more alternative hypotheses. Rejection of the null hypothesis does not necessarily imply acceptance of any one of the alternative hypotheses; it only warns the investigator of possible shortcomings of the null that is being advocated. Hypothesis testing does not seek a definite outcome and if carried out with due care need not lead to a favorite model. For example, in the case of nonnested hypothesis testing it is possible for all models under consideration to be rejected, or all models to be deemed as observationally equivalent.

They conclude that the choice between hypothesis testing and model selection depends on the primary objective of one’s study. Model selection may be more appropriate when the objective is decision making, while hypothesis testing is better suited to inferential problems.

A model may be empirically adequate for a particular purpose, but of little relevance for another use… In the real world where the truth is elusive and unknowable both approaches to model evaluation are worth pursuing.

(5) White’s (1982) Information-Matrix (IM) Test

This is a general specification test much like the Hausman (1978) specification test which will be considered in details in Chapter 11. The latter is based on two different estimates of the regression coefficients, while the former is based on two different estimates of the Information Matrix I(O) where O’ = (@’,a2) in the case of the linear regression studied in Chapter 7. The first estimate of I(O) evaluates the expectation of the second derivatives of the log-likelihood at the MLE, i. e., —E(d2logL/dOdO’) at Omie while the second sum up the outer products of the score vectors £rn=1(dlogLi(O)/dO)(dlogLi(O)/dO)’ evaluated at Ome. This is based on the fundamental identity that

I (O) = – E (d 2logL/dOdO’) = E(d logL/dO)(d logL/dO)’   If the model estimated by MLE is not correctly specified, this equality will not hold. From Chap­ter 7, equation (7.19), we know that for the linear regression model with normal disturbances, the first estimate of I(O) denoted by I1(Omie) is given by  where we used the fact that En^ eixi — 0. If the model is correctly specified and the distur­bances are normal then

plim Ii(eMLE)/n — plim Ii(eMLE)/n — I(в)

Therefore, the Information Matrix (IM) test rejects the model when [I2(eMLE) – I1(eMLE)/n  is too large. These are two matrices with (k + 1) by (k + 1) elements since в is к x 1 and a2 is a scalar. However, due to symmetry, this reduces to (k + 2)(k + 1)/2 unique elements. Hall (1987) noted that the first k(k + 1)/2 unique elements obtained from the first k x k block of (8.68) have a typical element En=1(e2 — a2)xirxis/na4 where r and s denote the r-th and s-th explanatory variables with r, s — 1,2,…,k. This term measures the discrepancy between the OLS estimates of the variance-covariance matrix of eOLs and its robust counterpart suggested by White (1980), see Chapter 5. The next k unique elements correspond to the off-diagonal block En=1 e3xi/2na6and this measures the discrepancy between the estimates of the cov(/3, a2). The last element correspond to the difference in the bottom right elements, i. e., the two estimates of 32. This is given by

These (k + 1)(k + 2)/2 unique elements can be arranged in vector form D(e) which has a limiting normal distribution with zero mean and some covariance matrix V(в) under the null. One can show, see Hall (1987) or Kramer and Sonnberger (1986) that if V(в) is estimated from the sample moments of these terms, that the IM test statistic is given by

m — nD’^)[V(в) ^(в) ^ X2fc+1)(fc+2)/2 (8.69)

In fact, Hall (1987) shows that this statistic is the sum of three asymptotically independent terms

m — m1 + m2 + m3 (8.70)

where m1 — a particular version of White’s heteroskedasticity test; m2 — n times the explained sum of squares from the regression of e3 on xi divided by 636; and  which is similar to the Jarque-Bera test for normality of the disturbances given in Chapter 5.

It is clear that the IM test will have power whenever the disturbances are non-normal or heteroskedastic. However, Davidson and MacKinnon (1992) demonstrated that the IM test considered above will tend to reject the model when true, much too often, in finite samples. This problem gets worse as the number of degrees of freedom gets large. In Monte Carlo experiments, Davidson and MacKinnon (1992) showed that for a linear regression model with ten regressors, the IM test rejected the null at the 5% level, 99.9% of the time for n = 200. This problem did not disappear when n increased. In fact, for n = 1000, the IM test still rejected the null 92.7% of the time at the 5% level.

These results suggest that it may be more useful to run individual tests for non-normality, heteroskedasticity and other misspecification tests considered above rather than run the IM test. These tests may be more powerful and more informative than the IM test. Alternative methods of calculating the IM test with better finite-sample properties are suggested in Orme (1990), Chesher and Spady (1991) and Davidson and MacKinnon (1992).

Example 3: For the consumption-income data given in Table 5.3, we first compute the RESET test from the consumption-income regression given in Chapter 5. Using EViews, one clicks on stability tests and then selects RESET. You will be prompted with the option of the number of fitted terms to include (i. e., powers of 9). Table 8.7 shows the RESET test including y2 and y3. The F-statistic for their joint-significance is equal to 94.94. This is significant and indicates misspecification.

Table 8.7 Ramsey RESET Test

 F-statistic Log likelihood ratio 94.93796 80.96735 Prob. F(2,45) Prob. Chi-Square(2) 0.00000 0.00000 Test Equation: Dependent Variable: CONSUM Method: Least Squares Sample: 1959 2007 Included observations: 49 Variable Coefficient Std. Error t-Statistic Prob. C 3519.599 1141.261 3.083956 0.0035 Y 0.421587 0.173597 2.428540 0.0192 FITTED~2 1.99E-05 1.09E-05 1.834317 0.0732 FITTED~3 -1.18E-10 2.10E-10 -0.560377 0.5780 R-squared 0.998789 Mean dependent var 16749.10 Adjusted R-squared 0.998708 S. D. dependent var 5447.060 S. E. of regression 195.7648 Akaike info criterion 13.46981 Sum squared resid 1724573. Schwarz criterion 13.62425 Log likelihood -326.0104 Hannan-Quinn criter. 13.52840 F-statistic Prob(F-statistic) 12372.26 0.000000 Durbin-Watson stat 1.001605

Table 8.8 Consumption Regression 1971-1995

 Dependent Variable: CONSUM Method: Least Squares Sample: 1971 1995 Included observations: 25 Variable Coefficient Std. Error t-Statistic Prob. C -1410.425 371.3812 -3.797783 0.0009 Y 0.963780 0.020036 48.10199 0.0000 R-squared 0.990157 Mean dependent var 16279.48 Adjusted R-squared 0.989730 S. D. dependent var 2553.097 S. E. of regression 258.7391 Akaike info criterion 14.02614 Sum squared resid 1539756. Schwarz criterion 14.12365 Log likelihood -173.3267 Hannan-Quinn criter. 14.05318 F-statistic 2313.802 Durbin-Watson stat 0.613064 Prob(F-statistic) 0.000000

Next, we compute Utts (1982) Rainbow test. Table 8.8 gives the middle 25 observations of our data, i. e., 1971-1995, and the EViews 6 regression using this data. The RSS of these middle observations is given by e’e = 1539756.14, while the RSS for the entire sample is given by Єe = 9001347.76 so that the observed F-statistic given in (8.49) can be computed as follows:

(9001347.76 – 1539756.14)/25 = 1539756.14/23 = .

This is distributed as F25)23 under the null hypothesis and rejects the hypothesis of linearity.

The PSW differencing test is computed using the artificial regression given in (8.56) with Zt = Yt+1 + Yt-1. The results are given in Table 8.9 using EViews 6. The f-statistic for Zt is 1.19 and has a p-value of 0.24 which is insignificant.

Now consider the two competing non-nested models:

H1; Ct = e 0 + PiYt + в 2Yt-1 + ut H2 5 Ct = 7o + Y iYt + Y 2Ct-1 + vt

The two non-nested models share Yt as a common variable. The artificial model that nests these two models is given by:

H35 Ct = So + SiYt + S2Y—1 + S3Ct-i + et

Table 8.10, runs regression (1) given by H2 and obtains the predicted values C2(C2HAT). Regression (2) runs consumption on a constant, income, lagged income and C2HAT. The coef­ficient of this last variable is 1.18 and is statistically significant with a f-value of 16.99. This is the Davidson and MacKinnon (1981) J-test. In this case, H1 is rejected but H2 is not rejected. The JA-test, given by Fisher and McAleer (1981) runs the regression in H1 and keeps the pre­dicted values C1(C 1HAT). This is done in regression (3). Then C1HAT is run on a constant, income and lagged consumption and the predicted values are stored as C2(C2TILDE). This is done in regression (5). The last step runs consumption on a constant, income, lagged income and C2 TILDE, see regression (6). The coefficient of this last variable is 97.43 and is statistically significant with a f-value of 16.99. Again H1 is rejected but H2 is not rejected.

Table 8.9 Artificial Regression to compute the PSW Differencing Test

 Dependent Variable: CONSUM Method: Least Squares Sample (adjusted): 1960 2006 Included observations: 47 after adjustments Coefficient Std. Error t-Statistic Prob. C -1373.390 226.1376 -6.073251 0.0000 Y 0.596293 0.321464 1.854930 0.0703 Z 0.191494 0.160960 1.189700 0.2405 R-squared 0.993678 Mean dependent var 16693.85 Adjusted R-squared 0.993390 S. D. dependent var 5210.244 S. E. of regression 423.5942 Akaike info criterion 14.99713 Sum squared resid 7895011. Schwarz criterion 15.11523 Log likelihood -349.4326 Hannan-Quinn criter. 15.04157 F-statistic 3457.717 Durbin-Watson stat 0.119325 Prob(F-statistic) 0.000000

Reversing the roles of H1 and H2, the J and JA-tests are repeated. In fact, regression (4) runs consumption on a constant, income, lagged consumption and C1 (which was obtained from regression (3)). The coefficient on C1 is —15.20 and is statistically significant with a i-value of —6.5. This J-test rejects H2 but does not reject H1. Regression (7) runs C2 on a constant, income and lagged income and the predicted values are stored as Ci (C1TILDE).The last step of the JA test runs consumption on a constant, income, lagged consumption and C1, see regression (8). The coefficient of this last variable is —1.11 and is statistically significant with a t-value of —6.5. This JA test rejects H2 but not H1. The artificial model, given in H3, is also estimated, see regression (9). One can easily check that the corresponding F-tests reject H1 against H3 and also H2 against H3. In sum, all evidence indicates that both Ct-1 and Yt-1 are important to include along with Yt. Of course, the true model is not known and could include higher lags of both Yt and Ct.

Stata 11 performs White’s (1982) Information matrix test by issuing the command estat imtest after running the regression of consumption on income. The results yield:

. estat imtest

Cameron & Trivedi’s decomposition of IM-test

 Source | chi2 df p —————— +– Heteroskedasticity | Skewness | Kurtosis | 2.64 0.45 4.40 2 1 1 0.2677 0.5030 0.0359 —————— +– Total | 7.48 4 0.1124

This does not reject the null even though Kurtosis seems to be a problem. Note that the IM test is split into its components following Hall (1987) as described above.

 Regression 1 Dependent Variable: CONSUM Method: Least Squares Sample (adjusted): 1960 2007 Included observations: 48 after adjustments Variable Coefficient Std. Error t-Statistic Prob. C -254.5241 155.2906 -1.639019 0.1082 Y 0.211505 0.068310 3.096256 0.0034 CONSUM(-1) 0.800004 0.070537 11.34159 0.0000 R-squared 0.998367 Mean dependent var 16915.21 Adjusted R-squared 0.998294 S. D. dependent var 5377.825 S. E. of regression 222.1108 Akaike info criterion 13.70469 Sum squared resid 2219995. Schwarz criterion 13.82164 Log likelihood -325.9126 Hannan-Quinn criter. 13.74889 F-statistic 13754.09 Durbin-Watson stat 0.969327 Prob(F-statistic) 0.000000 Regression 2 Dependent Variable: CONSUM Method: Least Squares Sample (adjusted): 1960 2007 Included observations: 48 after adjustments Variable Coefficient Std. Error t-Statistic Prob. C 144.3306 125.5929 1.149194 0.2567 Y 0.425354 0.090692 4.690091 0.0000 Y(-1) -0.613631 0.094424 -6.498678 0.0000 C2HAT 1.184853 0.069757 16.98553 0.0000 R-squared 0.999167 Mean dependent var 16915.21 Adjusted R-squared 0.999110 S. D. dependent var 5377.825 S. E. of regression 160.4500 Akaike info criterion 13.07350 Sum squared resid 1132745. Schwarz criterion 13.22943 Log likelihood -309.7639 Hannan-Quinn criter. 13.13242 F-statistic 17585.25 Durbin-Watson stat 1.971939 Prob(F-statistic) 0.000000
 Table 8.10

 Non-nested J and JA Tests for the Consumption Regression

8.2 Nonlinear Least Squares and the Gauss-Newton Regression4

So far we have been dealing with linear regressions. But, in reality, one might face a nonlinear regression of the form:

yt = xt(e)+ ut for t = 1, 2,…,T (8.71)

where ut ~ IID(0, a2) and xt(fi) is a scalar nonlinear regression function of k unknown param­eters в. It can be interpreted as the expected value of yt conditional on the values of the inde-

 Table 8.10 (continued) Regression 3 Dependent Variable: CONSUM Method: Least Squares Sample (adjusted): 1960 2007 Included observations: 48 after adjustments Variable Coefficient Std. Error t-Statistic Prob. C -1424.802 231.2843 -6.160393 0.0000 Y 0.943371 0.232170 4.063283 0.0002 Y(-1) 0.040368 0.234363 0.172244 0.8640 R-squared 0.993702 Mean dependent var 16915.21 Adjusted R-squared 0.993423 S. D. dependent var 5377.825 S. E. of regression 436.1488 Akaike info criterion 15.05431 Sum squared resid 8560159. Schwarz criterion 15.17126 Log likelihood -358.3033 Hannan-Quinn criter. 15.09850 F-statistic 3550.327 Durbin-Watson stat 0.174411 Prob(F-statistic) 0.000000 Regression 4 Dependent Variable: CONSUM Method: Least Squares Sample (adjusted): 1960 2007 Included observations: 48 after adjustments Variable Coefficient Std. Error t-Statistic Prob. C -21815.80 3319.691 -6.571637 0.0000 Y 15.01623 2.278648 6.589974 0.0000 CONSUM(-1) 0.947887 0.055806 16.98553 0.0000 C1HAT -15.20110 2.339106 -6.498678 0.0000 R-squared 0.999167 Mean dependent var 16915.21 Adjusted R-squared 0.999110 S. D. dependent var 5377.825 S. E. of regression 160.4500 Akaike info criterion 13.07350 Sum squared resid 1132745. Schwarz criterion 13.22943 Log likelihood -309.7639 Hannan-Quinn criter. 13.13242 F-statistic 17585.25 Durbin-Watson stat 1.971939 Prob(F-statistic) 0.000000

pendent variables. Nonlinear least squares minimizes ^’t=i(Vt — xt(3))2 = (y — x(3))(y — x(3)). The first-order conditions for minimization yield

X ‘ф)(У — x(3))=0 (8.72)

where X(в) is a T x k matrix with typical element Xtj(в) = dxt(/3)/d/3j for j = 1,…, k. The solution to these k equations yield the Nonlinear Least Squares (NLS) estimates of 3 denoted by 3nls. These normal equations given in (8.72) are similar to those in the linear case in that they

 Table 8.10 (continued) Regression 5 Dependent Variable: C1HAT Method: Least Squares Sample (adjusted): 1960 2007 Included observations: 48 after adjustments Variable Coefficient Std. Error t-Statistic Prob. C -1418.403 7.149223 -198.3996 0.0000 Y 0.973925 0.003145 309.6905 0.0000 CONSUM(-1) 0.009728 0.003247 2.995785 0.0044 R-squared 0.999997 Mean dependent var 16915.21 Adjusted R-squared 0.999996 S. D. dependent var 5360.865 S. E. of regression 10.22548 Akaike info criterion 7.548103 Sum squared resid 4705.215 Schwarz criterion 7.665053 Log likelihood -178.1545 Hannan-Quinn criter. 7.592298 F-statistic 6459057. Durbin-Watson stat 1.678118 Prob(F-statistic) 0.000000 Regression 6 Dependent Variable: CONSUM Method: Least Squares Sample (adjusted): 1960 2007 Included observations: 48 after adjustments Variable Coefficient Std. Error t-Statistic Prob. C 138044.4 8211.501 16.81111 0.0000 Y -94.21814 5.603155 -16.81519 0.0000 Y(-1) -0.613631 0.094424 -6.498678 0.0000 C2TILDE 97.43471 5.736336 16.98553 0.0000 R-squared 0.999167 Mean dependent var 16915.21 Adjusted R-squared 0.999110 S. D. dependent var 5377.825 S. E. of regression 160.4500 Akaike info criterion 13.07350 Sum squared resid 1132745. Schwarz criterion 13.22943 Log likelihood -309.7639 Hannan-Quinn criter. 13.13242 F-statistic 17585.25 Durbin-Watson stat 1.971939 Prob(F-statistic) 0.000000

require the vector of residuals y — х(в) to be orthogonal to the matrix of derivatives X(в). In the linear case, х(в) = XeOLS and X(в) = X where the latter is independent of в. Because of this dependence of the fitted values х(в) as well as the matrix of derivatives X(в) on /3, one in general cannot get explicit analytical solution to these NLS first-order equations. Under fairly general conditions, see Davidson and MacKinnon (1993), one can show that the eNLS has asymptotically

 Table 8.10 (continued) Regression 7 Dependent Variable: C2HAT Method: Least Squares Sample (adjusted): 1960 2007 Included observations: 48 after adjustments Variable Coefficient Std. Error t-Statistic Prob. C -1324.328 181.8276 -7.283424 0.0000 Y 0.437200 0.182524 2.395306 0.0208 Y(-1) 0.551966 0.184248 2.995785 0.0044 R-squared 0.996101 Mean dependent var 16915.21 Adjusted R-squared 0.995928 S. D. dependent var 5373.432 S. E. of regression 342.8848 Akaike info criterion 14.57313 Sum squared resid 5290650. Schwarz criterion 14.69008 Log likelihood -346.7551 Hannan-Quinn criter. 14.61732 F-statistic 5748.817 Durbin-Watson stat 0.127201 Prob(F-statistic) 0.000000 Regression 8 Dependent Variable: CONSUM Method: Least Squares Sample (adjusted): 1960 2007 Included observations: 48 after adjustments Variable Coefficient Std. Error t-Statistic Prob. C -1629.522 239.4806 -6.804403 0.0000 Y 1.161999 0.154360 7.527865 0.0000 CONSUM(-1) 0.947887 0.055806 16.98553 0.0000 C1TILDE -1.111718 0.171068 -6.498678 0.0000 R-squared 0.999167 Mean dependent var 16915.21 Adjusted R-squared 0.999110 S. D. dependent var 5377.825 S. E. of regression 160.4500 Akaike info criterion 13.07350 Sum squared resid 1132745. Schwarz criterion 13.22943 Log likelihood -309.7639 Hannan-Quinn criter. 13.13242 F-statistic 17585.25 Durbin-Watson stat 1.971939 Prob(F-statistic) 0.000000

a normal distribution with mean в о and asymptotic variance a0(X'(i30)X (во)) 1, where в о and a0 are the true values of the parameters generating the data. Similarly, defining

s2 = (y – x(3NLs))I(v – X(J3NLS))/(T – k)

we get a feasible estimate of this covariance matrix as s2(Xl(3)X(в))-1. If the disturbances are normally distributed then NLS is MLE and therefore asymptotically efficient as long as the model is correctly specified, see Chapter 7.

Table 8.10 (continued)

Regression 9

Dependent Variable: CONSUM Method: Least Squares Sample (adjusted): 1960 2007 Included observations: 48 after adjustments

 Variable Coefficient Std. Error t-Statistic Prob. C -157.2430 113.1743 -1.389389 0.1717 Y 0.675956 0.086849 7.783091 0.0000 Y(-1) -0.613631 0.094424 -6.498678 0.0000 CONSUM(-1) 0.947887 0.055806 16.98553 0.0000 R-squared 0.999167 Mean dependent var 16915.21 Adjusted R-squared 0.999110 S. D. dependent var 5377.825 S. E. of regression 160.4500 Akaike info criterion 13.07350 Sum squared resid 1132745. Schwarz criterion 13.22943 Log likelihood -309.7639 Hannan-Quinn criter. 13.13242 F-statistic 17585.25 Durbin-Watson stat 1.971939 Prob(F-statistic) 0.000000

Taking the first-order Taylor series approximation around some arbitrary parameter vector в*, we get

y = х(в*) + X(в*)(в — в*) + higher-order terms + u (8.73)

or

y — х(в*) = X (в*)Ь + residuals (8.74)

This is the simplest version of the Gauss-Newton Regression, see Davidson and MacKinnon (1993). In this case the higher-order terms and the error term are combined in the residuals and (в — в*) is replaced by b, a parameter vector that can be estimated. If the model is linear, X(в*) is the matrix of regressors X and the GNR regresses a residual on X. If в*=выьз, the unrestricted NLS estimator of в, then the GNR becomes

y — x = Xb + residuals (8.75)

where X = x(f3NLS) and X = X(/3NLS). From the first-order conditions of NLS we get (y — x)’X = 0. In this case, OLS on this GNR yields bOLS = (X’X)-1X'(y — X) = 0 and this GNR has no explanatory power. However, this regression can be used to (i) check that the first – order conditions given in (8.72) are satisfied. For example, one could check that the t-statistics are of the 10-3 order, and that R2 is zero up to several decimal places; (ii) compute estimated covariance matrices. In fact, this GNR prints out s2(X’X)-1, where s2 = (y — X)'(y — X)/(T — k) is the OLS estimate of the regression variance. This can be verified easily using the fact that this GNR has no explanatory power. This method of computing the estimated variance-covariance matrix is useful especially in cases where в has been obtained by some method other than NLS.

For example, sometimes the model is nonlinear only in one or two parameters which are known to be in a finite range, say between zero and one. One can then search over this range, running OLS regressions and minimizing the residual sum of squares. This search procedure can be repeated over finer grids to get more accuracy. Once the final parameter estimate is found, one can run the GNR to get estimates of the variance-covariance matrix.

Testing Restrictions (GNR Based on the Restricted NLS Estimates)

The best known use for the GNR is to test restrictions. These are based on the LM principle which requires only the restricted estimator. In particular, consider the following competing hypotheses:

Ho; y = х(ві, 0) + u Hi; y = x(вl, в2)+u

where u ~ IID(0, a21) and в 1 and в2 are k x 1 and r x 1, respectively. Denote by в the restricted NLS estimator of в, in this case в = (в 1,0).

The GNR evaluated at this restricted NLS estimator of в is

(y — ж) = X1b1 + X2b2 + residuals (8.76)

where ж = х(в) and Xi = Xi(f3) with Xi(в) = дх/дві for i = 1,2.

By the FWL Theorem this yields the same estimate of b2 as

Рї£ (y — X) = X2b2 + residuals (8.77)

But Px£ (y — X) = (y — X) — P%i (y — X) = (y — ж) since X1 (y — ж) = 0 from the first-order

conditions of restricted NLS. Hence, (8.77) reduces to (y — ж) = P%i X2b2 + residuals

Therefore,

b2,OLS = (X2 P>x1 ^2)-1^2 P>x1 (y — X) = (X2 P>x1 ^2)-1^2 (y — Ж)

and the residual sums of squares is (y — X)'(y — X) — (y — X’)’X2(X2P% X2)-1X2(y — ж).

If X2 was excluded from the regression in (8.76), (y — X)'(y — ж) would be the residual sum of squares. Therefore, the reduction in the residual sum of squares brought about by the inclusion of X2 is

(y — T№(X 2 P2i X2)-1X2 (y — X)

This is also equal to the explained sum of squares from (8.76) since X1 has no explanatory power. This sum of squares divided by a consistent estimate of a2 is asymptotically distributed as X2 under the null.

Different consistent estimates of a2 yield different test statistics. The two most common test statistics for H0 based on this regression are the following: (1) TR where RU is the uncentered^ R^of (8.76) and (2) the F-statistic for b2 = 0. The first statistic is given by TRU = T(y — X)’X2(X’2P% X2)-1X’2(y — X)/(y — X)(y — X) where the uncentered R2 was defined in the

Appendix to Chapter 3. This statistic implicitly divides the explained sum of squares term by x2 = (restricted residual sums of squares)/T. This is equivalent to the LM-statistic obtained by running the artificial regression (y — x)/a on X and getting the explained sum of squares. Regression packages print the centered R2. This is equal to the uncentered RU as long as there is a constant in the restricted regression so that (y — x) sum to zero.

The F-statistic for b2 = 0 from (8.76) is At в = 0 and в = fioLS, the GNR becomes (yt — Xt(3OLS) = Xtb + (Xt(3OLS)2c+ residual. The f-statistic on c = 0 is equivalent to that from the RESET regression given in section 8.3, see problem 25.