Specification Tests
(1) Ramsey’s (1969) RESET (Regression Specification Error Test)
Ramsey suggests testing the specification of the linear regression model yt = X[fi + ut by augmenting it with a set of regressors Zt so that the augmented model is
yt = XtP + ZtY + ut (8.48)
If the Zt’s are available then the specification test would reduce to the Ftest for Ho; y = 0. The crucial issue is the choice of Zt variables. This depends upon the true functional form under the alternative, which is usually unknown. However, this can be often well approximated by higher powers of the initial regressors, as in the case where the true form is quadratic or cubic. Alternatively, one might approximate it with higher moments of yt = X’teols. The popular Ramsey RESET test is carried out as follows:
(1) Regress yt on Xt and get yt.
(2) Regress yt on Xt, y2, yf and y)4 and test that the coefficients of all the powers of yt are
zero. This is an F3,Tk3 under the null.
Note that yt is not included among the regressors because it would be perfectly multicollinear with Xt.[12] Different choices of Zt’s may result in more powerful tests when H0 is not true. Thursby and Schmidt (1977) carried out an extensive Monte Carlo and concluded that the test based on Zt = [Xt2,Xf, X)4] seems to be generally the best choice.
(2) Utts’ (1982) Rainbow Test
The basic idea behind the Rainbow test is that even when the true relationship is nonlinear, a good linear fit can still be obtained over subsets of the sample. The test therefore rejects the null hypothesis of linearity whenever the overall fit is markedly inferior to the fit over a properly selected subsample of the data, see Figure 8.3.
Let Єє be the OLS residuals sum of squares from all available n observations and let e! є be the OLS residual sum of squares from the middle half of the observations (T/2). Then
Under H0; E(e’e/(Tk)) = a2 = E Єе/ (j — k)], while in general under HA; E(є’є/(Т—k)) > E Єе/ (j — k)] > a2. The RRSS is Єє because all the observations are forced to fit the straight line, whereas the URSS is e’e because only a part of the observations are forced to fit a straight line. The crucial issue of the Rainbow test is the proper choice of the subsample (the middle T/2 observations in case of one regressor). This affects the power of the test and not the distribution of the test statistic under the null. Utts (1982) recommends points close to X, since an incorrect linear fit will in general not be as far off there as it is in the outer region. Closeness to X is measured by the magnitude of the corresponding diagonal elements of PX. Close points are those with low leverage ha, see section 8.1. The optimal size of the subset depends upon the alternative. Utts recommends about 1/2 of the data points in order to obtain some robustness to outliers. The Ftest in (8.49) looks like a Chow test, but differs in the selection of the subsample. For example, using the postsample predictive Chow test, the data are arranged according to time and the first T observations are selected. The Rainbow test arranges the data according to their distance from X and selects the first T/2 of them. 3
вгв = (X X)1X ‘y (8.51)
with var(/3FD) = a2(X’X)1 X’DD’X(X’X)1 since var(U) = a2(DD)’ and
2 
—1 
0 .. 
. 0 
0 

— 1 
2 
—1 .. 
. 0 
0 

DD’ = 
0 
0 
0 .. 
. 2 
—1 
0 
0 
0 .. 
. —1 
2 
The differencing test is based on
q = Pfd — Pols with V (q) = a2[V (Pfd) — V (Pols )] A consistent estimate of V(V is
where v2 is a consistent estimate of a2. Therefore, A = TV[V(q)]1q ~ Xk under H0
where k is the number of slope parameters if V(V) is nonsingular. V(V) could be singular, in which case we use a generalized inverse V(V) of V(V) and in this case is distributed as x2 with degrees of freedom equal to the rank(F(V)). This is a special case of the general Hausman (1978) test which will be studied extensively in Chapter 11.
Davidson, Godfrey, and MacKinnon (1985) show that, like the Hausman test, the PSW test is equivalent to a much simpler omitted variables test, the omitted variables being the sum of the lagged and oneperiod ahead values of the regressors.
Thus if the regression equation we are considering is
it = filXlt + в2х2і + Ut
the PSW test involves estimating the expanded regression equation it = PlXlt + @2x2t + Y1 Zlt + Y2z2t + Ut
where zit = x,t+i + xi, ti and z2t = x2,t+i + x2,ti and testing the hypothesis Yi = Y2 = 0 by the usual Ftest.
If there are lagged dependent variables in the equation, the test needs a minor modification. Suppose that the model is
yt = PiVti + в 2Xt + ut (8.57)
Now the omitted variables would be defined as zit = yt + yt2 and z2t = xt+i + xti. There is no problem with z2t but zit would be correlated with the error term ut because of the presence of yt in it. The solution would be simply to transfer it to the left hand side and write the expanded regression equation in (8.56) as
(1 – Yi)yt = вiyti + e2xt + Yiyt2 + Y2z2t + ut (8.58)
This equation can be written as
yt = eyti + в *2xt + Y *yt2 + Y2z2t + u*t (8.59)
where all the starred parameters are the corresponding unstarred ones divided by (1 – Yi).
The PSW now tests the hypothesis y2 = Yt = 0. Thus, in the case where the model involves the lagged dependent variable yti as an explanatory variable, the only modification needed is that we should use yt2 as the omitted variable, not (yt + yt2). Note that it is only yti that creates a problem, not higherorder lags of yt, like yt2,yt3, and so on. For yt2, the corresponding zt will be obtained by adding yti to yt3. This zt is not correlated with ut as long as the disturbances are not serially correlated.
(4) Tests for Nonnested Hypothesis
Consider the following two competing nonnested models:
Hi; y = Xiei + ei (8.60)
H2; y = Х2в2 + e2 (8.61)
These are nonnested because the explanatory variables under one model are not a subset of the other model even though X1 and X2 may share some common variables. In order to test H1 versus H2, Cox (1961) modified the LRtest to allow for the nonnested case. The idea behind Cox’s approach is to consider to what extent Model I under Hi, is capable of predicting the performance of Model II, under H2.
Alternatively, one can artificially nest the 2 models
Ha; y = Xiвi + Х2в2 + Є3 (8.62)
where X22 excludes from X2 the common variables with X1. A test for H1 is simply the Ftest for Ho; в2 = 0.
Criticism: This tests Hi versus H3 which is a (Hybrid) of Hi and H2 and not Hi versus H2. Davidson and MacKinnon (1981) proposed (testing a = 0) in the linear combination of Hi and H2:
y = (1 – a)Xlвl + аХ2в2 + e (8.63)
where a is an unknown scalar. Since a is not identified, we replace в2 by в2,ols = (X2 X2/T )i (X2y/T) the regression coefficient estimate obtained from running y on X2 under H2, i. e., (1)
Run y on X2 get y2 = X2@2,ols; (2) Run y on X1 and y2 and test that the coefficient of y2 is zero. This is known as the Jtest and this is asymptotically N(0,1) under H1.
Fisher and McAleer (1981) suggested a modification of the J – test known as the JA test.
Under Hi; plim/?2 = plim(X2X2/T)_1plim(X2XjT)@1 + 0 (8.64)
Therefore, they propose replacing f32 by /32 = (X2X2)_1(X2X1)[31,OLS where f31,OLS = (XiX1)_1 Xiy. The steps for the JAtest are as follows:
1. Run y on X1 get 31 = X13hoLS■
2. Run 31 on X2 get 32 = X2XX2)_1X231.
3. Run y on X1 and 32 and test that the coefficient of 32 is zero. This is the simple fstatistic on the coefficient of 32. The J and JA tests are asymptotically equivalent.
Criticism: Note the asymmetry of H1 and H2. Therefore one should reverse the role of these hypotheses and test again.
In this case one can get the four scenarios depicted in Table 8.6. In case both hypotheses are not rejected, the data are not rich enough to discriminate between the two hypotheses. In case both hypotheses are rejected neither model is useful in explaining the variation in y. In case one hypothesis is rejected while the other is not, one should remember that the nonrejected hypothesis may still be brought down by another challenger hypothesis.
Small Sample Properties: (i) The J – test tends to reject the null more frequently than it should. Also, the JA test has relatively low power when K1, the number of parameters in H1 is larger than K2, the number of parameters in H2. Therefore, one should use the JA test when K1 is about the same size as K2, i. e., the same number of nonoverlapping variables. (ii) If both H1 and H2 are false, these tests are inferior to the standard diagnostic tests. In practice, use higher significance levels for the Jtest, and supplement it with the artificially nested Ftest and standard diagnostic tests.
a
Note: J and JA tests are one degree of freedom tests, whereas the artificially nested Ftest is not.
logit versus probit specification, the set of regressors is most likely to be the same. It is only the form of the distribution functions that separate the two models. Pesaran and Weeks (2001, p. 287) emphasize the differences between hypothesis testing and model selection:
The model selection process treats all models under consideration symmetrically, while hypothesis testing attributes a different status to the null and to the alternative hypotheses and by design treats the models asymmetrically. Model selection always ends in a definite outcome, namely one of the models under consideration is selected for use in decision making. Hypothesis testing on the other hand asks whether there is any statistically significant evidence (in the NeymanPearson sense) of departure from the null hypothesis in the direction of one or more alternative hypotheses. Rejection of the null hypothesis does not necessarily imply acceptance of any one of the alternative hypotheses; it only warns the investigator of possible shortcomings of the null that is being advocated. Hypothesis testing does not seek a definite outcome and if carried out with due care need not lead to a favorite model. For example, in the case of nonnested hypothesis testing it is possible for all models under consideration to be rejected, or all models to be deemed as observationally equivalent.
They conclude that the choice between hypothesis testing and model selection depends on the primary objective of one’s study. Model selection may be more appropriate when the objective is decision making, while hypothesis testing is better suited to inferential problems.
A model may be empirically adequate for a particular purpose, but of little relevance for another use… In the real world where the truth is elusive and unknowable both approaches to model evaluation are worth pursuing.
(5) White’s (1982) InformationMatrix (IM) Test
This is a general specification test much like the Hausman (1978) specification test which will be considered in details in Chapter 11. The latter is based on two different estimates of the regression coefficients, while the former is based on two different estimates of the Information Matrix I(O) where O’ = (@’,a2) in the case of the linear regression studied in Chapter 7. The first estimate of I(O) evaluates the expectation of the second derivatives of the loglikelihood at the MLE, i. e., —E(d2logL/dOdO’) at Omie while the second sum up the outer products of the score vectors £rn=1(dlogLi(O)/dO)(dlogLi(O)/dO)’ evaluated at Ome. This is based on the fundamental identity that
I (O) = – E (d 2logL/dOdO’) = E(d logL/dO)(d logL/dO)’
If the model estimated by MLE is not correctly specified, this equality will not hold. From Chapter 7, equation (7.19), we know that for the linear regression model with normal disturbances, the first estimate of I(O) denoted by I1(Omie) is given by
where we used the fact that En^ eixi — 0. If the model is correctly specified and the disturbances are normal then
plim Ii(eMLE)/n — plim Ii(eMLE)/n — I(в)
Therefore, the Information Matrix (IM) test rejects the model when
[I2(eMLE) – I1(eMLE)/n
is too large. These are two matrices with (k + 1) by (k + 1) elements since в is к x 1 and a2 is a scalar. However, due to symmetry, this reduces to (k + 2)(k + 1)/2 unique elements. Hall (1987) noted that the first k(k + 1)/2 unique elements obtained from the first k x k block of (8.68) have a typical element En=1(e2 — a2)xirxis/na4 where r and s denote the rth and sth explanatory variables with r, s — 1,2,…,k. This term measures the discrepancy between the OLS estimates of the variancecovariance matrix of eOLs and its robust counterpart suggested by White (1980), see Chapter 5. The next k unique elements correspond to the offdiagonal block En=1 e3xi/2na6and this measures the discrepancy between the estimates of the cov(/3, a2). The last element correspond to the difference in the bottom right elements, i. e., the two estimates of 32. This is given by
These (k + 1)(k + 2)/2 unique elements can be arranged in vector form D(e) which has a limiting normal distribution with zero mean and some covariance matrix V(в) under the null. One can show, see Hall (1987) or Kramer and Sonnberger (1986) that if V(в) is estimated from the sample moments of these terms, that the IM test statistic is given by
m — nD’^)[V(в) ^(в) ^ X2fc+1)(fc+2)/2 (8.69)
In fact, Hall (1987) shows that this statistic is the sum of three asymptotically independent terms
m — m1 + m2 + m3 (8.70)
where m1 — a particular version of White’s heteroskedasticity test; m2 — n times the explained sum of squares from the regression of e3 on xi divided by 636; and
which is similar to the JarqueBera test for normality of the disturbances given in Chapter 5.
It is clear that the IM test will have power whenever the disturbances are nonnormal or heteroskedastic. However, Davidson and MacKinnon (1992) demonstrated that the IM test considered above will tend to reject the model when true, much too often, in finite samples. This problem gets worse as the number of degrees of freedom gets large. In Monte Carlo experiments, Davidson and MacKinnon (1992) showed that for a linear regression model with ten regressors, the IM test rejected the null at the 5% level, 99.9% of the time for n = 200. This problem did not disappear when n increased. In fact, for n = 1000, the IM test still rejected the null 92.7% of the time at the 5% level.
These results suggest that it may be more useful to run individual tests for nonnormality, heteroskedasticity and other misspecification tests considered above rather than run the IM test. These tests may be more powerful and more informative than the IM test. Alternative methods of calculating the IM test with better finitesample properties are suggested in Orme (1990), Chesher and Spady (1991) and Davidson and MacKinnon (1992).
Example 3: For the consumptionincome data given in Table 5.3, we first compute the RESET test from the consumptionincome regression given in Chapter 5. Using EViews, one clicks on stability tests and then selects RESET. You will be prompted with the option of the number of fitted terms to include (i. e., powers of 9). Table 8.7 shows the RESET test including y2 and y3. The Fstatistic for their jointsignificance is equal to 94.94. This is significant and indicates misspecification.
Table 8.7 Ramsey RESET Test

Table 8.8 Consumption Regression 19711995

Next, we compute Utts (1982) Rainbow test. Table 8.8 gives the middle 25 observations of our data, i. e., 19711995, and the EViews 6 regression using this data. The RSS of these middle observations is given by e’e = 1539756.14, while the RSS for the entire sample is given by Єe = 9001347.76 so that the observed Fstatistic given in (8.49) can be computed as follows:
(9001347.76 – 1539756.14)/25 = 1539756.14/23 = .
This is distributed as F25)23 under the null hypothesis and rejects the hypothesis of linearity.
The PSW differencing test is computed using the artificial regression given in (8.56) with Zt = Yt+1 + Yt1. The results are given in Table 8.9 using EViews 6. The fstatistic for Zt is 1.19 and has a pvalue of 0.24 which is insignificant.
Now consider the two competing nonnested models:
H1; Ct = e 0 + PiYt + в 2Yt1 + ut H2 5 Ct = 7o + Y iYt + Y 2Ct1 + vt
The two nonnested models share Yt as a common variable. The artificial model that nests these two models is given by:
H35 Ct = So + SiYt + S2Y—1 + S3Cti + et
Table 8.10, runs regression (1) given by H2 and obtains the predicted values C2(C2HAT). Regression (2) runs consumption on a constant, income, lagged income and C2HAT. The coefficient of this last variable is 1.18 and is statistically significant with a fvalue of 16.99. This is the Davidson and MacKinnon (1981) Jtest. In this case, H1 is rejected but H2 is not rejected. The JAtest, given by Fisher and McAleer (1981) runs the regression in H1 and keeps the predicted values C1(C 1HAT). This is done in regression (3). Then C1HAT is run on a constant, income and lagged consumption and the predicted values are stored as C2(C2TILDE). This is done in regression (5). The last step runs consumption on a constant, income, lagged income and C2 TILDE, see regression (6). The coefficient of this last variable is 97.43 and is statistically significant with a fvalue of 16.99. Again H1 is rejected but H2 is not rejected.
Table 8.9 Artificial Regression to compute the PSW Differencing Test

Reversing the roles of H1 and H2, the J and JAtests are repeated. In fact, regression (4) runs consumption on a constant, income, lagged consumption and C1 (which was obtained from regression (3)). The coefficient on C1 is —15.20 and is statistically significant with a ivalue of —6.5. This Jtest rejects H2 but does not reject H1. Regression (7) runs C2 on a constant, income and lagged income and the predicted values are stored as Ci (C1TILDE).The last step of the JA test runs consumption on a constant, income, lagged consumption and C1, see regression (8). The coefficient of this last variable is —1.11 and is statistically significant with a tvalue of —6.5. This JA test rejects H2 but not H1. The artificial model, given in H3, is also estimated, see regression (9). One can easily check that the corresponding Ftests reject H1 against H3 and also H2 against H3. In sum, all evidence indicates that both Ct1 and Yt1 are important to include along with Yt. Of course, the true model is not known and could include higher lags of both Yt and Ct.
Stata 11 performs White’s (1982) Information matrix test by issuing the command estat imtest after running the regression of consumption on income. The results yield:
. estat imtest Cameron & Trivedi’s decomposition of IMtest

This does not reject the null even though Kurtosis seems to be a problem. Note that the IM test is split into its components following Hall (1987) as described above.
Regression 1 

Dependent Variable: CONSUM 

Method: Least Squares 

Sample (adjusted): 1960 2007 

Included observations: 48 after adjustments 

Variable 
Coefficient 
Std. Error tStatistic 
Prob. 
C 
254.5241 
155.2906 1.639019 
0.1082 
Y 
0.211505 
0.068310 3.096256 
0.0034 
CONSUM(1) 
0.800004 
0.070537 11.34159 
0.0000 
Rsquared 
0.998367 
Mean dependent var 
16915.21 
Adjusted Rsquared 
0.998294 
S. D. dependent var 
5377.825 
S. E. of regression 
222.1108 
Akaike info criterion 
13.70469 
Sum squared resid 
2219995. 
Schwarz criterion 
13.82164 
Log likelihood 
325.9126 
HannanQuinn criter. 
13.74889 
Fstatistic 
13754.09 
DurbinWatson stat 
0.969327 
Prob(Fstatistic) 
0.000000 

Regression 2 

Dependent Variable: CONSUM 

Method: Least Squares 

Sample (adjusted): 1960 2007 

Included observations: 48 after adjustments 

Variable 
Coefficient 
Std. Error tStatistic 
Prob. 
C 
144.3306 
125.5929 1.149194 
0.2567 
0.425354 
0.090692 4.690091 
0.0000 

Y(1) 
0.613631 
0.094424 6.498678 
0.0000 
C2HAT 
1.184853 
0.069757 16.98553 
0.0000 
Rsquared 
0.999167 
Mean dependent var 
16915.21 
Adjusted Rsquared 
0.999110 
S. D. dependent var 
5377.825 
S. E. of regression 
160.4500 
Akaike info criterion 
13.07350 
Sum squared resid 
1132745. 
Schwarz criterion 
13.22943 
Log likelihood 
309.7639 
HannanQuinn criter. 
13.13242 
Fstatistic 
17585.25 
DurbinWatson stat 
1.971939 
Prob(Fstatistic) 
0.000000 
Table 8.10 
Nonnested J and JA Tests for the Consumption Regression 
8.2 Nonlinear Least Squares and the GaussNewton Regression4
So far we have been dealing with linear regressions. But, in reality, one might face a nonlinear regression of the form:
yt = xt(e)+ ut for t = 1, 2,…,T (8.71)
where ut ~ IID(0, a2) and xt(fi) is a scalar nonlinear regression function of k unknown parameters в. It can be interpreted as the expected value of yt conditional on the values of the inde
Table 8.10 (continued) 

Regression 3 

Dependent Variable: CONSUM Method: Least Squares Sample (adjusted): 1960 2007 

Included observations: 48 after adjustments 

Variable 
Coefficient 
Std. Error tStatistic 
Prob. 
C 
1424.802 
231.2843 6.160393 
0.0000 
Y 
0.943371 
0.232170 4.063283 
0.0002 
Y(1) 
0.040368 
0.234363 0.172244 
0.8640 
Rsquared 
0.993702 
Mean dependent var 
16915.21 
Adjusted Rsquared 
0.993423 
S. D. dependent var 
5377.825 
S. E. of regression 
436.1488 
Akaike info criterion 
15.05431 
Sum squared resid 
8560159. 
Schwarz criterion 
15.17126 
Log likelihood 
358.3033 
HannanQuinn criter. 
15.09850 
Fstatistic 
3550.327 
DurbinWatson stat 
0.174411 
Prob(Fstatistic) 
0.000000 

Regression 4 

Dependent Variable: CONSUM Method: Least Squares Sample (adjusted): 1960 2007 

Included observations: 48 after adjustments 

Variable 
Coefficient 
Std. Error tStatistic 
Prob. 
C 
21815.80 
3319.691 6.571637 
0.0000 
Y 
15.01623 
2.278648 6.589974 
0.0000 
CONSUM(1) 
0.947887 
0.055806 16.98553 
0.0000 
C1HAT 
15.20110 
2.339106 6.498678 
0.0000 
Rsquared 
0.999167 
Mean dependent var 
16915.21 
Adjusted Rsquared 
0.999110 
S. D. dependent var 
5377.825 
S. E. of regression 
160.4500 
Akaike info criterion 
13.07350 
Sum squared resid 
1132745. 
Schwarz criterion 
13.22943 
Log likelihood 
309.7639 
HannanQuinn criter. 
13.13242 
Fstatistic 
17585.25 
DurbinWatson stat 
1.971939 
Prob(Fstatistic) 
0.000000 
pendent variables. Nonlinear least squares minimizes ^’t=i(Vt — xt(3))2 = (y — x(3))(y — x(3)). The firstorder conditions for minimization yield
X ‘ф)(У — x(3))=0 (8.72)
where X(в) is a T x k matrix with typical element Xtj(в) = dxt(/3)/d/3j for j = 1,…, k. The solution to these k equations yield the Nonlinear Least Squares (NLS) estimates of 3 denoted by 3nls. These normal equations given in (8.72) are similar to those in the linear case in that they
Table 8.10 (continued) 

Regression 5 

Dependent Variable: C1HAT Method: Least Squares Sample (adjusted): 1960 2007 

Included observations: 48 after adjustments 

Variable 
Coefficient 
Std. Error tStatistic 
Prob. 
C 
1418.403 
7.149223 198.3996 
0.0000 
Y 
0.973925 
0.003145 309.6905 
0.0000 
CONSUM(1) 
0.009728 
0.003247 2.995785 
0.0044 
Rsquared 
0.999997 
Mean dependent var 
16915.21 
Adjusted Rsquared 
0.999996 
S. D. dependent var 
5360.865 
S. E. of regression 
10.22548 
Akaike info criterion 
7.548103 
Sum squared resid 
4705.215 
Schwarz criterion 
7.665053 
Log likelihood 
178.1545 
HannanQuinn criter. 
7.592298 
Fstatistic 
6459057. 
DurbinWatson stat 
1.678118 
Prob(Fstatistic) 
0.000000 

Regression 6 

Dependent Variable: CONSUM Method: Least Squares Sample (adjusted): 1960 2007 

Included observations: 48 after adjustments 

Variable 
Coefficient 
Std. Error tStatistic 
Prob. 
C 
138044.4 
8211.501 16.81111 
0.0000 
Y 
94.21814 
5.603155 16.81519 
0.0000 
Y(1) 
0.613631 
0.094424 6.498678 
0.0000 
C2TILDE 
97.43471 
5.736336 16.98553 
0.0000 
Rsquared 
0.999167 
Mean dependent var 
16915.21 
Adjusted Rsquared 
0.999110 
S. D. dependent var 
5377.825 
S. E. of regression 
160.4500 
Akaike info criterion 
13.07350 
Sum squared resid 
1132745. 
Schwarz criterion 
13.22943 
Log likelihood 
309.7639 
HannanQuinn criter. 
13.13242 
Fstatistic 
17585.25 
DurbinWatson stat 
1.971939 
Prob(Fstatistic) 
0.000000 
require the vector of residuals y — х(в) to be orthogonal to the matrix of derivatives X(в). In the linear case, х(в) = XeOLS and X(в) = X where the latter is independent of в. Because of this dependence of the fitted values х(в) as well as the matrix of derivatives X(в) on /3, one in general cannot get explicit analytical solution to these NLS firstorder equations. Under fairly general conditions, see Davidson and MacKinnon (1993), one can show that the eNLS has asymptotically
Table 8.10 (continued) 

Regression 7 

Dependent Variable: C2HAT 

Method: Least Squares 

Sample (adjusted): 1960 2007 

Included observations: 48 after adjustments 

Variable 
Coefficient 
Std. Error tStatistic 
Prob. 
C 
1324.328 
181.8276 7.283424 
0.0000 
Y 
0.437200 
0.182524 2.395306 
0.0208 
Y(1) 
0.551966 
0.184248 2.995785 
0.0044 
Rsquared 
0.996101 
Mean dependent var 
16915.21 
Adjusted Rsquared 
0.995928 
S. D. dependent var 
5373.432 
S. E. of regression 
342.8848 
Akaike info criterion 
14.57313 
Sum squared resid 
5290650. 
Schwarz criterion 
14.69008 
Log likelihood 
346.7551 
HannanQuinn criter. 
14.61732 
Fstatistic 
5748.817 
DurbinWatson stat 
0.127201 
Prob(Fstatistic) 
0.000000 

Regression 8 

Dependent Variable: CONSUM 

Method: Least Squares 

Sample (adjusted): 1960 2007 

Included observations: 48 after adjustments 

Variable 
Coefficient 
Std. Error tStatistic 
Prob. 
C 
1629.522 
239.4806 6.804403 
0.0000 
Y 
1.161999 
0.154360 7.527865 
0.0000 
CONSUM(1) 
0.947887 
0.055806 16.98553 
0.0000 
C1TILDE 
1.111718 
0.171068 6.498678 
0.0000 
Rsquared 
0.999167 
Mean dependent var 
16915.21 
Adjusted Rsquared 
0.999110 
S. D. dependent var 
5377.825 
S. E. of regression 
160.4500 
Akaike info criterion 
13.07350 
Sum squared resid 
1132745. 
Schwarz criterion 
13.22943 
Log likelihood 
309.7639 
HannanQuinn criter. 
13.13242 
Fstatistic 
17585.25 
DurbinWatson stat 
1.971939 
Prob(Fstatistic) 
0.000000 
a normal distribution with mean в о and asymptotic variance a0(X'(i30)X (во)) 1, where в о and a0 are the true values of the parameters generating the data. Similarly, defining
s2 = (y – x(3NLs))I(v – X(J3NLS))/(T – k)
we get a feasible estimate of this covariance matrix as s2(Xl(3)X(в))1. If the disturbances are normally distributed then NLS is MLE and therefore asymptotically efficient as long as the model is correctly specified, see Chapter 7.
Table 8.10 (continued)
Regression 9
Dependent Variable: CONSUM Method: Least Squares Sample (adjusted): 1960 2007 Included observations: 48 after adjustments

Taking the firstorder Taylor series approximation around some arbitrary parameter vector в*, we get
y = х(в*) + X(в*)(в — в*) + higherorder terms + u (8.73)
or
y — х(в*) = X (в*)Ь + residuals (8.74)
This is the simplest version of the GaussNewton Regression, see Davidson and MacKinnon (1993). In this case the higherorder terms and the error term are combined in the residuals and (в — в*) is replaced by b, a parameter vector that can be estimated. If the model is linear, X(в*) is the matrix of regressors X and the GNR regresses a residual on X. If в*=выьз, the unrestricted NLS estimator of в, then the GNR becomes
y — x = Xb + residuals (8.75)
where X = x(f3NLS) and X = X(/3NLS). From the firstorder conditions of NLS we get (y — x)’X = 0. In this case, OLS on this GNR yields bOLS = (X’X)1X'(y — X) = 0 and this GNR has no explanatory power. However, this regression can be used to (i) check that the first – order conditions given in (8.72) are satisfied. For example, one could check that the tstatistics are of the 103 order, and that R2 is zero up to several decimal places; (ii) compute estimated covariance matrices. In fact, this GNR prints out s2(X’X)1, where s2 = (y — X)'(y — X)/(T — k) is the OLS estimate of the regression variance. This can be verified easily using the fact that this GNR has no explanatory power. This method of computing the estimated variancecovariance matrix is useful especially in cases where в has been obtained by some method other than NLS.
For example, sometimes the model is nonlinear only in one or two parameters which are known to be in a finite range, say between zero and one. One can then search over this range, running OLS regressions and minimizing the residual sum of squares. This search procedure can be repeated over finer grids to get more accuracy. Once the final parameter estimate is found, one can run the GNR to get estimates of the variancecovariance matrix.
Testing Restrictions (GNR Based on the Restricted NLS Estimates)
The best known use for the GNR is to test restrictions. These are based on the LM principle which requires only the restricted estimator. In particular, consider the following competing hypotheses:
Ho; y = х(ві, 0) + u Hi; y = x(вl, в2)+u
where u ~ IID(0, a21) and в 1 and в2 are k x 1 and r x 1, respectively. Denote by в the restricted NLS estimator of в, in this case в = (в 1,0).
The GNR evaluated at this restricted NLS estimator of в is
(y — ж) = X1b1 + X2b2 + residuals (8.76)
where ж = х(в) and Xi = Xi(f3) with Xi(в) = дх/дві for i = 1,2.
By the FWL Theorem this yields the same estimate of b2 as
Рї£ (y — X) = X2b2 + residuals (8.77)
But Px£ (y — X) = (y — X) — P%i (y — X) = (y — ж) since X1 (y — ж) = 0 from the firstorder
conditions of restricted NLS. Hence, (8.77) reduces to
(y — ж) = P%i X2b2 + residuals
Therefore,
b2,OLS = (X2 P>x1 ^2)1^2 P>x1 (y — X) = (X2 P>x1 ^2)1^2 (y — Ж)
and the residual sums of squares is (y — X)'(y — X) — (y — X’)’X2(X2P% X2)1X2(y — ж).
If X2 was excluded from the regression in (8.76), (y — X)'(y — ж) would be the residual sum of squares. Therefore, the reduction in the residual sum of squares brought about by the inclusion of X2 is
(y — T№(X 2 P2i X2)1X2 (y — X)
This is also equal to the explained sum of squares from (8.76) since X1 has no explanatory power. This sum of squares divided by a consistent estimate of a2 is asymptotically distributed as X2 under the null.
Different consistent estimates of a2 yield different test statistics. The two most common test statistics for H0 based on this regression are the following: (1) TR where RU is the uncentered^ R^of (8.76) and (2) the Fstatistic for b2 = 0. The first statistic is given by TRU = T(y — X)’X2(X’2P% X2)1X’2(y — X)/(y — X)(y — X) where the uncentered R2 was defined in the
Appendix to Chapter 3. This statistic implicitly divides the explained sum of squares term by x2 = (restricted residual sums of squares)/T. This is equivalent to the LMstatistic obtained by running the artificial regression (y — x)/a on X and getting the explained sum of squares. Regression packages print the centered R2. This is equal to the uncentered RU as long as there is a constant in the restricted regression so that (y — x) sum to zero.
The Fstatistic for b2 = 0 from (8.76) is
(RRSS — URSS)/r _ (У — [13]‘X2(X2X2)—1X2(У — x)/r
At в = 0 and в = fioLS, the GNR becomes (yt — Xt(3OLS) = Xtb + (Xt(3OLS)2c+ residual. The fstatistic on c = 0 is equivalent to that from the RESET regression given in section 8.3, see problem 25.
Testing for Serial Correlation
Suppose that the null hypothesis is the nonlinear regression model given in (8.71), and the alternative is the model yt = xt(fi) + vt with vt = pvti + ut where ut IID(0,ct2). Conditional
on the first observation, the alternative model can be written as
yt = xt(e) + p(yti — xti(e)) + ut
The GNR test for H0; p = 0, computes the derivatives of this regression function with respect to в and p evaluated at the restricted estimates under the null hypothesis, i. e., p = 0 and в = вNLS (the nonlinear least squares estimate of в assuming no serial correlation). Those yield Xt(@NLs) and (yti — xti(/3NLs)) respectively. Therefore, the GNR runs fit = yt — xt(/3NLs) = Xt^NLs^ + cfit1+ residual, and tests that c = 0. If the regression model is linear, this reduces to running ordinary least squares residuals on their lagged values in addition to the regressors in the model. This is exactly the Breusch and Godfrey test for firstorder serial correlation considered in Chapter 5. For other applications as well as benefits and limitations of the GNR, see Davidson and MacKinnon (1993).
Leave a reply