Testing in a Pooled Model
(1) The ChowTest
Before pooling the data one may be concerned whether the data is poolable. This hypothesis is also known as the stability of the regression equation across firms or across time. It can be formulated in terms of an unrestricted model which involves a separate regression equation for each firm
Vi = Zi6i + Ui for i = 1, 2,…,N
where vi = (yii,… ,ViT), Zi = [it, Xi and Xi is (T x K). 8’i is 1 x (K + 1) and ui is T x 1. The important thing to notice is that 6i is different for every regional equation. We want to test the hypothesis H0; 8i=8 for all i, versus H1; 8i = 8 for some i. Under H0 we can write the restricted model given in (12.41) as:
V = Z8 + u
where Z1 = (Z’1 ,Z’2,…,Z’N) and U = (u/1,u/2, written as
(12.42)
,u’N). The unrestricted model can also be
(12.43)
K1 = K + 1. Hence the variables in Z are all linear combinations of the variables in Z*. Under the assumption that u ~ N(0, a2INT), the MVU estimator for 8 in equation (12.42) is
8ols = 8mle = (Z’Z)_1Z’v (12.44)
and therefore
V = Z8ols + e (12.45)
implying that e = (INT — Z(Z1 Z)1Z’)y = My = M(Z8 + u) = Mu since MZ = 0. Similarly, under the alternative, the MVU for 8i is given by
8i, oLS = 8i, MLE = (Z’Zi)1Z’Vi (12.46)
and therefore (12.47)
Vi — Zi8i, OLS + ei

One can easily deduce that y = Z*<5* + e* with e* = M*y = M*u and = (Z*’Z*)1 Z*’y. Note that both M and M* are symmetric and idempotent with MM* = M*. This easily follows since
Z (Z’Z)1Z’Z *(Z*’Z* )1Z *’ = Z (Z’Z)1I*’Z *’Z*(Z*’Z*)1Z*’
This uses the fact that Z = Z*I*. Now, e’e — e*’e* = u'(M — M*)u and e*’e* = u’M*u are
independent since (M — M*)M* = 0. Also, both quadratic forms when divided by a2 are distributed as x2 since (M — M*) and M* are idempotent, see Judge et al. (1985). Dividing these quadratic forms by their respective degrees of freedom, and taking their ratio leads to the following test statistic:
(e’e — e^e! — e’2e2 — .. — e’N eN )/(N — 1)K’
(e’e + e’2e2 + .. + e’N eN )/N (T — K’)
Under H0, Fobs is distributed as an F((N — 1)K’, N(T — K’)), see lemma 2.2 of Fisher (1970). This is exactly the Chow’s (1960) test extended to the case of N linear regressions.
The URSS in this case is the sum of the N residual sum of squares obtained by applying OLS to (12.41), i. e., on each firm equation separately. The RRSS is simply the RSS from OLS performed on the pooled regression given by (12.42). In this case, there are (N — 1)K’ restrictions and the URSS has N(T — K’) degrees of freedom. Similarly, one can test the stability of the regression across time. In this case, the degrees of freedom are (T — 1)K’ and N(T — K’) respectively. Both tests target the whole set of regression coefficients including the constant. If the LSDV model is suspected to be the proper specification, then the intercepts are allowed to vary but the slopes remain the same. To test the stability of the slopes only, the same Chow – test can be utilized, however the RRSS is now that of the LSDV regression with firm (or time) dummies only. The number of restrictions becomes (N — 1)K for testing the stability of the slopes across firms and (T — 1)K for testing their stability across time.
The Chowtest however is proper under spherical disturbances, and if that hypothesis is not correct it will lead to improper inference. Baltagi (1981) showed that if the true specification of the disturbances is an error components structure then the Chowtest tend to reject poolability too often when in fact it is true. However, a generalization of the Chowtest which takes care of the general variancecovariance matrix is available in Zellner (1962). This is exactly the test of the null hypothesis H0; R/3 = r when Q is that of the error components specification, see Chapter 9. Baltagi (1981) shows that this test performs well in Monte Carlo experiments. In this case, all we need to do is transform our model (under both the null and alternative hypotheses) such that the transformed disturbances have a variance of a2INT, then apply the Chowtest on the transformed model. The later step is legitimate because the transformed disturbances have homoskedastic variances and the usual Chowtest is legitimate. Given Q = a2U, we premultiply the restricted model given in (12.42) by U1/2 and we call U1/2y = y, U1/2Z = Z and U1/2u = u. Hence
y = Z 6 + u (12.49)
with E(uu’) = U1/2E(uu’)U1/2′ = a2INT. Similarly, we premultiply the unrestricted model given in (12.43) by U1/2 and we call U1/2Z* = Z*. Therefore
y = Z *6* + u (12.50)
with E(uu’) = a[16]1nt■
At this stage, we can test Ho; 6i = 6 for every i = 1, 2,…,N, simply by using the Chow – statistic, only now on the transformed models (12.49) and (12.50) since they satisfy u ~ N(0,a2INT). Note that Z = Z*I* which is simply obtained from Z = Z*I* by premultiplying by S1/2. Defining M = Int – Z(Z’Z)1Z’, and M* = INt – Z*(Z*’Z*)1Z*’, it is easy to show that M and M * are both symmetric, idempotent and such that M lid * = lid *. Once again the conditions for lemma 2.2 of Fisher (1970) are satisfied, and the teststatistic
where e = y – Z6ols and 6ols = (Z’Z) 1Z’y implying that e = My = Mu. Similarly,
**
e* = y – Z*6OLS and 6OLS = (Z*’Z*)1 Z*’y implying that e* = M*y = M*u. This is the Chowtest after premultiplying the model by S1/2 or simply applying the Fuller and Battese (1974) transformation. See Baltagi (2008) for details.
For the gasoline data in Baltagi and Griffin (1983), Chow’s test for poolability across countries yields an observed Fstatistic of 129.38 and is distributed as F(68,270) under H0; 6i = 6 for i = 1,…,N. This tests the stability of four timeseries regression coefficients across 18 countries. The unrestricted SSE is based upon 18 OLS timeseries regressions, one for each country. For the stability of the slope coefficients only, H0; /3i = в, an observed Fvalue of 27.33 is obtained which is distributed as F(51,270) under the null. Chow’s test for poolability across time yields an F value of 0.276 which is distributed as F (72,266) under H0; 6t = 6 for t = 1,…,T. This tests the stability of four crosssection regression coefficients across 19 time periods. The unrestricted SSE is based upon 19 OLS crosssection regressions, one for each year. This does not reject poolability across timeperiods. The test for poolability across countries, allowing for a oneway error components model yields an Fvalue of 21.64 which is distributed as F(68,270) under H0; 6i = 6 for i = 1,…,N. The test for poolability across time yields an Fvalue of 1.66 which is distributed as F(72,266) under H0; 6t = 6 for t = 1,…,T. This rejects H0 at the 5% level.
where eit denotes the OLS residuals on the pooled model, ei. denote their sum over t, respectively. Under the null hypothesis H0 this LM statistic is distributed as a x2. For the gasoline data in Baltagi and Griffin (1983), the Breusch and Pagan LM test yields an LM statistic of 1465.6. This is obtained using the Stata command xtest0 after estimating the model with random effects. This is significant and rejects the null hypothesis. The corresponding likelihood ratio test assuming Normal disturbances is also reported by Stata maximum likelihood output for the random effects model. This yields an LR statistic of 463.97 which is asymptotically distributed as x1 under the null hypothesis H0 and is also significant.
One problem with the BreuschPagan test is that it assumes that the alternative hypothesis is twosided when we know that a> 0. A onesided version of this test is given by Honda (1985):
where e denotes the vector of OLS residuals. Note that the square of this N(0,1) statistic is the Breusch and Pagan (1980) LM teststatistic. Honda (1985) finds that this test statistic is uniformly most powerful and robust to nonnormality. However, Moulton and Randolph (1989) showed that the asymptotic N(0,1) approximation for this onesided LM statistic can be poor even in large samples. They suggest an alternative Standardized Lagrange Multiplier (SLM) test whose asymptotic critical values are generally closer to the exact critical values than those of the LM test. This SLM test statistic centers and scales the onesided LM statistic so that its mean is zero and its variance is one.
HO – E(HO) _ d – E(d) ^/var(HO) var(d)
where d = e’De/e’e and D = (In О Jt). Using the results on moments of quadratic forms in regression residuals, see for e. g., Evans and King (1985), we get
E(d) = tr(DPz )/p
and
var(d) = 2{p tr(DPZ)2 — [tr(DPZ)]2}/p2(p + 2) (12.55)
where p = n — (K + 1) and PZ = In — Z(Z’Z)1Z’. Under the null hypothesis, SLM has an asymptotic N(0, 1) distribution.
(3) The HausmanTest
A critical assumption in the error components regression model is that E(uit/Xit) = 0. This is important given that the disturbances contain individual effects (the p. fs) which are unobserved and may be correlated with the Xu’s. For example, in an earnings equation these ^’s may denote unobservable ability of the individual and this may be correlated with the schooling variable included on the right hand side of this equation. In this case, E(uit/Xit) = 0 and the GLS estimator Pols becomes biased and inconsistent for /3. However, the within transformation wipes out these p. fs and leaves the Within estimator вWithin unbiased and consistent for в. Hausman (1978) suggests comparing eGLS and в within, both of which are consistent under the null hypothesis Ho; E(uit/Xit) = 0, but which will have different probability limits if H0 is not true. In fact, eWithin is consistent whether H0 is true or not, while eGLS is BLUE, consistent and asymptotically efficient under Ho, but is inconsistent when H0 is false. A natural test statistic would be based on 3 = eGLS — eWithin. Under H0, plim 3 = 0^and cov(3, eGLS) = 0.
Using the fact that /3GLS — в = (X’Q1X)1X’Qlu and /3Within — в = (X’QX)1Х’Qu, one gets E(3) = 0 and
cov(/^GLS, q) = ^^GLS) — cov(/^GLS, PWithin)
= (X ‘Q1X )1 — (X’n1X )1X’ n1E (uu’)QX (X ‘QX )1 = 0
Using the fact that /3Within = /3GLS — q, one gets var(f3Withim) = var(qGLs )+var(q), since cov(/3GLS, q) = 0. Therefore,
var(q) = var (/3 w ithin) — var( Pols ) = (X’QX )1 — (X’U1X)1 (12.56)
Hence, the Hausman test statistic is given by
m = qf[var(q)]1q (12.57)
and under H0 is asymptotically distributed as Xk, where K denotes the dimension of slope vector в. In order to make this test operational, Q is replaced by a consistent estimator Q, and GLS by its corresponding FGLS. An alternative asymptotically equivalent test can be obtained from the augmented regression
y* = X *@ + X 7 + w (12.58)
where y* = ovQ1/2y, X* = ovQ1/2X and X = QX. Hausman’s test is now equivalent to testing whether 7 = 0. This is a standard Wald test for the omission of the variables X from (12.58).
This test was generalized by Arellano (1993) to make it robust to heteroskedasticity and autocorrelation of arbitrary forms. In fact, if either heteroskedasticity or serial correlation is present, the variances of the Within and GLS estimators are not valid and the corresponding Hausman test statistic is inappropriate. For the Baltagi and Griffin (1983) gasoline data, the Hausman test statistic based on the difference between the Within estimator and that of feasible GLS based on Swamy and Arora (1972) yields a x3 value of m = 306.1 which rejects the null hypothesis. This is obtained using the Stata command hausman.
Leave a reply