# Autocorrelation

Violation of assumption 3 means that the disturbances are correlated, i. e., E(щUj) = aj = 0, for i = j, and i, j = 1,2,…,n. Since ui has zero mean, E (uiUj) = cov(ui, Uj) and this is denoted by aij. This correlation is more likely to occur in time-series than in cross-section studies. Consider estimating the consumption function of a random sample of households. An unexpected event, like a visit of family members will increase the consumption of this household. However, this positive disturbance need not be correlated to the disturbances affecting consumption of other randomly drawn households. However, if we were estimating this consumption function using aggregate time-series data for the U. S., then it is very likely that a recession year affecting consumption negatively this year may have a carry over effect to the next few years. A shock to the economy like an oil embargo in 1973 is likely to affect the economy for several years. A labor strike this year may affect production for the next few years. Therefore, we will switch the i and j subscripts to t and s denoting time-series observations t, s = 1,2,…,T and the sample size will be denoted by T rather than n. This covariance term is symmetric, so that a12 = E(uu2) = E(u2u1) = a21. Hence, only T(T — 1)/2 distinct ats’s have to be estimated. For example, if T = 3, then ai2, аіз and a23 are the distinct covariance terms. However, it is hopeless to estimate T(T — 1)/2 covariances (ats) with only T observations. Therefore, more structure on these ats’s need to be imposed. A popular assumption is that the ut’s follow a first-order autoregressive process denoted by AR(1): ut — put-i + C t — 1, 2,…,T

where et is IID(0,ct2). It is autoregressive because Ut is related to its lagged value Ut-t. One can also write (5.26), for period t — 1, as Ut-1 = pUt—2 + et-t

and substitute (5.27) in (5.26) to get

Ut = p2Ut-2 + pet-t + et (5.28)

Note that the power of p and the subscript of u or e always sum to t. By continuous substitution of this form, one ultimately gets

Ut = ptUo + p tet + .. + pet-t + et (5.29)

This means that Ut is a function of current and past values of et and U0 where U0 is the initial value of Ut. If U0 has zero mean, then Ut has zero mean. This follows from (5.29) by taking expectations. Also, from (5.26)

var(Ut) = p2var( Ut-t) + var(et) + 2pcov(Ut-t, et) (5.30)

Using (5.29), Ut-t is a function of et-t, past values of et-t and U0. Since U0 is independent of the e’s, and the e’s are themselves not serially correlated, then Ut-t is independent of et. This means that cov(Ut-t, et) = 0. Furthermore, for Ut to be homoskedastic, var(Ut) = var(Ut-t) = a2u, and (5.30) reduces to a2U = p2a2U + a2, which when solved for a2U gives:

aU = a2/(1 — p2) (5.31)

Hence, U0 ~ (0,a22/(1 — p2)) for the Us to have zero mean and homoskedastic disturbances. Multiplying (5.26) by Ut-t and taking expected values, one gets

E(UtUt-t) = pE (U-t) + E (Ut-tet) = pa2u (5.32)

since E(U|_t) = a2u and E(Ut-tet) = 0. Therefore, cov(Ut, Ut-t) = pa2u, and the correlation coef­ficient between Ut and Ut-t is correl(Ut, Ut-t) = cov(Ut, Ut-t)^Jvar(Ut)var(Ut-t) = pa2u/a2u = p. Since p is a correlation coefficient, this means that —1 < p < 1. In general, one can show that

cov(Ut, Us)= plt-slaU t, s = 1,2,…,T (5.33)

see problem 6. This means that the correlation between Ut and Ut-r is pr, which is a fraction raised to an integer power, i. e., the correlation is decaying between the disturbances the further apart they are. This is reasonable in economics and may be the reason why this autoregressive form (5.26) is so popular. One should note that this is not the only form that would correlate the disturbances across time. In Chapter 14, we will consider other forms like the Moving Average (MA) process, and higher order Autoregressive Moving Average (ARMA) processes, but these are beyond the scope of this chapter.

Consequences for OLS

How is the OLS estimator affected by the violation of the no autocorrelation assumption among the disturbances? The OLS estimator is still unbiased and consistent since these properties rely on assumptions 1 and 4 and have nothing to do with assumption 3. For the simple linear regression, using (5.2), the variance of Pols is now

var(f3OLS) = varQ]^ wtut) = ELi ELi WtWscov(ut, Us) (5.34)

= Tt=i X2 + WtWsPlt~slvl

t=s

where cov(ut, us) = plt-sloU as explained in (5.33). Note that the first term in (5.34) is the usual variance of /3OLS under the classical case. The second term in (5.34) arises because of the correlation between the ut’s. Hence, the variance of OLS computed from a regression package, i. e., s21 ‘t=i xt is a wrong estimate of the variance of POLS for two reasons. First, it is using

the wrong formula for the variance, i. e., oUlYl’t=i x2 rather than (5.34). The latter depends on p through the extra term in (5.34). Second, one can show, see problem 7, that E(s2) = oU and will involve p as well as o2a. Hence, s2 is not unbiased for o2a and s2/Yl’t=i x2 is a biased estimate of var(eOLS). The direction and magnitude of this bias depends on p and the regressor. In fact, if p is positive, and the xt’s are themselves positively autocorrelated, then s2/Yl’t=i x2 understates the true variance of Pols. This means that the confidence interval for в is tighter than it should be and the t-statistic for Ho; в = 0 is overblown, see problem 8. As in the heteroskedastic case, but for completely different reasons, any inference based on var(eOLS) reported from the standard regression packages will be misleading if the ut’s are serially correlated.

Newey and West (1987) suggested a simple heteroskedasticity and autocorrelation-consistent covariance matrix for the OLS estimator without specifying the functional form of the serial correlation. The basic idea extends White’s (1980) replacement of heteroskedastic variances with squared OLS residuals e2 by additionally including products of least squares residuals etet-s for s = 0, ±1,…, ±p where p is the maximum order of serial correlation we are willing to assume. The consistency of this procedure relies on p being very small relative to the number of observations T. This is consistent with popular serial correlation specifications considered in this chapter where the autocorrelation dies out quickly as j increases. Newey and West (1987) allow the higher order covariance terms to receive diminishing weights. This Newey – West option for the least squares estimator is available using EViews. Andrews (1991) warns about the unreliability of such standard error corrections in some circumstances. Wooldridge (1991) shows that it is possible to construct serially correlated robust F-statistics for testing joint hypotheses as considered in Chapter 4. However, these are beyond the scope of this book.

Is OLS still BLUE? In order to determine the BLU estimator in this case, we lag the regression equation once, multiply it by p, and subtract it from the original regression equation, we get

Yt – pYt-i = а(1 – р)+в (Xt – pXt-i) + et t = 2, 3,…,T (5.35)

This transformation, known as the Cochrane-Orcutt (1949) transformation, reduces the dis­turbances to classical errors. Therefore, OLS on th^resulting regression renders the estimates BLU, i. e., run Yt = Yt — pYt-i on a constant and Xt = Xt — pXt-i, for t = 2, 3,…,T. Note that we have lost one observation by lagging, and the resulting estimators are BLUE only for linear combinations of (T — 1) observations in Y. i Prais and Winsten (1954) derive the BLU estimators for linear combinations of T observations in Y. This entails recapturing the initial observation as follows: (i) Multiply the first observation of the regression equation by 1 — p;

y/1 — P2Yi = ад/1 — p2 + /3/1 — p2Xi + y/1 — P2ui

(ii) add this transformed initial observation to the Cochrane-Orcutt transformed observations for t = 2,… ,T and run the regression on the T observations rather than the (T —1) observations. See Chapter 9, for a formal proof of this result. Note that

Mi = y/1 — P2Yi

and

Yt = Yt — pYt-i for t = 2,…,T

Similarly, Xi=y/1 — p2Xi and Xt = Xt — pXt-i for t = 2,…,T. The constant variable Ct = 1 for t = 1,…,T is now a new variable Ct which takes the values Ci=v 1 — p2 and Ct = (1 — p) for t = 2,…,T. Hence, the Prais-Winsten procedure is the regression of Yt on Ct and Xt without a constant. It is obvious that the resulting BLU estimators will involve p and are therefore, different from the usual OLS estimators except in the case where p = 0. Hence, OLS is no longer BLUE. Furthermore, we need to know p in order to obtain the BLU estimators. In applied work, p is not known and has to be estimated, in which case the Prais-Winsten regression is no longer BLUE since it is based on an estimate of p rather than the true p itself. However, as long as p is a consistent estimate for p then this is a sufficient condition for the corresponding estimates of a and в in the next step to be asymptotically efficient, see Chapter 9. We now turn to various methods of estimating p.

(1) The Cochrane-Orcutt (1949) Method: This method starts with an initial estimate of p, the most convenient is 0, and minimizes the residual sum of squares in (5.35). This gives us the OLS estimates of a and в. Then we substitute aOLS and вOLS in (5.35) and we get

et = pet-i + et t = 2,…,T (5.36)

where et denotes the OLS residual. An estimate of p can be obtained by minimizing the residual sum of squares in (5.36) or running the regression of et on et-i without a constant. The resulting estimate of p is pco = Y)Т=2 etet_i^^T=2 ef_i where both summations run over t = 2, 3,…,T. The second step of the Cochrane-Orcutt procedure (2SCO) is to perform the regression in (5.35) with ‘pco instead of p. One can iterate this procedure (ITCO) by computing new residuals based on the new estimates of a and в and hence a new estimate of p from (5.36), and so on, until convergence. Both the 2SCO and the ITCO are asymptotically efficient, the argument for iterating must be justified in terms of small sample gains.

the minimum residual sum of squares, one can search next between 0.51 and 0.69 in intervals of 0.01. This search procedure guards against a local minimum. Since the likelihood in this case contains p as well as a2 and a and в, this search procedure can be modified to maximize the likelihood rather than minimize the residual sum of squares, since the two criteria are no longer equivalent. The maximum value of the likelihood will give our choice of p and the corresponding estimates of a, в and a2.

(3) Durbin’s (1960) Method: One can rearrange (5.35) by moving Yt-1 to the right hand side, i. e.,

Yt = pYt-i + a(1 – p)+ вХ — peXt-i + et (5.37)

and running OLS on (5.37). The error in (5.37) is classical, and the presence of Yt-1 on the right hand side reminds us of the contemporaneously uncorrelated case discussed under the violation of assumption 4. For that violation, we have shown that unbiasedness is lost, but not consistency. Hence, the estimate of p as a coefficient of Yt-1 is biased but consistent. This is the Durbin estimate of p, call it p)D. Next, the second step of the Cochrane-Orcutt procedure is performed using this estimate of p.

(4) Beach-MacKinnon (1978) Maximum Likelihood Procedure: Beach and MacKinnon (1978) derived a cubic equation in p which maximizes the likelihood function concentrated with respect to a, в, and a2. With this estimate of p, denoted by ‘pBM, one performs the Prais – Winsten procedure in the next step.

Correcting for serial correlation is not without its critics. Mizon (1995) argues this point forcefully in his article entitled “A simple message for autocorrelation correctors: Don’t.” The main point being that serial correlation is a symptom of dynamic misspecification which is better represented using a general unrestricted dynamic specification.

Monte Carlo Results

Rao and Griliches (1969) performed a Monte Carlo study using an autoregressive Xt, and various values of p. They found that OLS is still a viable estimator as long as p < 0.3, but if p > 0.3, then it pays to perform procedures that correct for serial correlation based on an estimator of p. Their recommendation was to compute a Durbin’s estimate of p in the first step and to do the Prais-Winsten procedure in the second step. Maeshiro (1976, 1979) found that if the Xt series is trended, which is usual with economic data, then OLS outperforms 2SCO, but not the two-step Prais-Winsten (2SPW) procedure that recaptures the initial observation. These results were confirmed by Park and Mitchell (1980) who performed an extensive Monte Carlo using trended and untrended Xt’s. Their basic findings include the following: (i) For trended Xt’s, OLS beats 2SCO, ITCO and even a Cochrane-Orcutt procedure that is based on the true p. However, OLS was beaten by 2SPW, iterative Prais-Winsten (ITPW), and Beach – MacKinnon (BM). Their conclusion is that one should not use regressions based on (T — 1) observations as in Cochrane and Orcutt. (ii) Their results find that the ITPW procedure is the recommended estimator beating 2SPW and BM for high values of true p, for both trended as well as nontrended Xt’s. (iii) Test of hypotheses regarding the regression coefficients performed miserably for all estimators based on an estimator of p. The results indicated less bias in standard error estimation for ITPW, BM and 2SPW than OLS. However, the tests based on these standard errors still led to a high probability of type I error for all estimation procedures.

Testing for Autocorrelation

So far, we have studied the properties of OLS under the violation of assumption 3. We have derived asymptotically efficient estimators of the coefficients based on consistent estimators of p and studied their small sample properties using Monte Carlo experiments. Next, we focus on the problem of detecting this autocorrelation between the disturbances. A popular diagnostic for detecting such autocorrelation is the Durbin and Watson (1951) statistic2

d = t=2(et — et-)2/ t=i et (5.38)

If this was based on the true ut’s and T was very large then d can be shown to tend in the limit as T gets large to 2(1 — p), see problem 9. This means that if p ^ 0, then d ^ 2; if p ^ 1, then d ^ 0 and if p ^ —1, then d ^ 4. Therefore, a test for H0; p = 0, can be based on whether d is close to 2 or not. Unfortunately, the critical values of d depend upon the Xt’s, and these vary from one data set to another. To get around this, Durbin and Watson established upper (du) and lower (dL) bounds for this critical value. If the observed d is less than dL, or larger than 4 — dL, we reject H0. If the observed d is between du and 4 — du, then we do not reject Ho. If d lies in any of the two indeterminant regions, then one should compute the exact critical values which depend on the data. Most regression packages report the Durbin-Watson statistic, but few give the exact p-value for this d-statistic. If one is interested in a single sided test, say H0; p = 0 versus Hi; p > 0 then one would reject H0 if d < dL, and not reject H0 if d > du. If dL < d < du, then the test is inconclusive. Similarly for testing H0; p = 0 versus Hi; p < 0, one computes (4 — d) and follow the steps for testing against positive autocorrelation. Durbin and Watson tables for dL and du covered samples sizes from 15 to 100 and a maximum of 5 regressors. Savin and White (1977) extended these tables for 6 < T < 200 and up to 10 regressors.

The Durbin-Watson statistic has several limitations. We discussed the inconclusive region and the computation of exact critical values. The Durbin-Watson statistic is appropriate when there is a constant in the regression. In case there is no constant in the regression, see Farebrother (1980). Also, the Durbin-Watson statistic is inappropriate when there are lagged values of the dependent variable among the regressors. We now turn to an alternative test for serial correlation that does not have these limitations and that is also easy to apply. This test was derived by Breusch (1978) and Godfrey (1978) and is known as the Breusch-Godfrey test for zero first-order serial correlation. This is a Lagrange Multiplier test that amounts to running the regression of the OLS residuals et on et-i and the original regressors in the model. The test statistic is TR2. Its distribution under the null is xl. In this case, the regressors are a constant and Xt, and the test checks whether the coefficient of et-i is significant. The beauty of this test is that (i) it is the same test for first-order serial correlation, whether the disturbances are Moving Average of order one MA(1) or AR(1). (ii) This test is easily generalizable to higher autoregressive or Moving Average schemes. For second-order serial correlation, like MA(2) or AR(2) one includes two lags of the residuals on the right hand side; i. e., both et-i and et-2. (iii) This test is still valid even when lagged values of the dependent variable are present among the regressors, see Chapter 6. The Breusch and Godfrey test is standard using EViews and it prompts the user with a choice of the number of lags of the residuals to include among the regressors to test for serial correlation. You click on residuals, then tests and choose Breusch-Godfrey. Next, you input the number of lagged residuals you want to include.

What about first differencing the data as a possible solution for getting rid of serial correla­tion? Some economic behavioral equations are specified with variables in first difference form, like GDP growth, but other equations are first differenced for estimation purposes. In the latter case, if the original disturbances were not autocorrelated, (or even correlated, with p = 1), then the transformed disturbances are serially correlated. After all, first differencing the disturbances is equivalent to setting p = 1 in ut — put-1, and this new disturbance uI = ut — ut-1 has ut-1 in common with u*—l = ut-i — ut-2, making E(ulu*—1) = — E(uf-1) = —a2u. However, one could

argue that if p is large and positive, first differencing the data may not be a bad solution. Rao and Miller (1971) calculated the variance of the BLU estimator correcting for serial correlation, for various guesses of p. They assume a true p of 0.2, and an autoregressive Xt

Xt = Xt-1 + wt with A = 0, 0.4, 0.8. (5.39)

They find that OLS (or a guess of p = 0), performs better than first differencing the data, and is pretty close in terms of efficiency to the true BLU estimator for trended Xt (A = 0.8). However, the performance of OLS deteriorates as A declines to 0.4 and 0, with respect to the true BLU estimator. This supports the Monte Carlo finding by Rao and Griliches that for p < 0.3, OLS performs reasonably well relative to estimators that correct for serial correlation. However, the first-difference estimator, i. e., a guess of p = 1, performs badly for trended Xt (A = 0.8) giving the worst efficiency when compared to any other guess of p. Only when the Xt’s are less trended (A = 0.4) or random (A = 0), does the efficiency of the first-difference estimator improve. However, even for those cases one can do better by guessing p. For example, for A = 0, one can always do better than first differencing by guessing any positive p less than 1. Similarly, for true p = 0.6, a higher degree of serial correlation, Rao and Miller (1971) show that the performance of OLS deteriorates, while that of the first difference improves. However, one can still do better than first differencing by guessing in the interval (0.4,0.9). This gain in efficiency increases with trended Xt’s.

Empirical Example: Table 5.3 gives the U. S. Real Personal Consumption Expenditures (C) and Real Disposable Personal Income (Y) from the Economic Report of the President over the period 1959-2007. This data set is available as CONSUMP. DAT on the Springer web site.

The OLS regression yields:

Ct = —1343.31 + 0.979 Yt + residuals (219.56) (0.011)

Figure 5.3 plots the actual, fitted and residuals using EViews 6.0. This shows positive serial correlation with a string of positive residuals followed by a string of negative residuals followed by positive residuals. The Durbin-Watson statistic is d = 0.181 which is much smaller than the lower bound d = 1.497 for T = 49 and one regressor. Therefore, we reject the null hypothesis of Ho; p = 0 at the 5% significance level.

The Breusch (1978) and Godfrey (1978) regression that tests for first-order serial correlation is given in Table 5.4. This is done using EViews 6.0.

This yields

et = —54.41 + 0.004 Yt + 0.909 et-1 + residuals (102.77) (0.005) (0.070)

The test statistic is TR2 which yields 49 x (0.786) = 38.5. This is distributed as x2 under H0; p = 0. This rejects the null hypothesis of no first order serial correlation with a p-value of 0.0000 shown in Table 5.4.

C = Real Personal Consumption Expenditures (in 1987 dollars) Y = Real Disposable Personal Income (in 1987 dollars)

 YEAR Y C 1959 8776 9685 1960 8837 9735 1961 8873 9901 1962 9170 10227 1963 9412 10455 1964 9839 11061 1965 10331 11594 1966 10793 12065 1967 10994 12457 1968 11510 12892 1969 11820 13163 1970 11955 13563 1971 12256 14001 1972 12868 14512 1973 13371 15345 1974 13148 15094 1975 13320 15291 1976 13919 15738 1977 14364 16128 1978 14837 16704 1979 15030 16931 1980 14816 16940 1981 14879 17217 1982 14944 17418 1983 15656 17828
 Source: Economic Report of the President Regressing the OLS residuals on their lagged values yields

et = 0.906 et-1 + residuals (0.062)

The two-step Cochrane-Orcutt (1949) procedure based on p = 0.906 using Stata 11 yields the results given in Table 5.5.

The Prais-Winsten (1954) procedure using Stata 11 yields the results given in Table 5.6. The estimate of the marginal propensity to consume is 0.979 for OLS, 0.989 for two-step Cochrane – Orcutt, and 0.912 for iterative Prais-Winsten. All of these estimates are significant.

The Newey-West heteroskedasticity and autocorrelation-consistent standard errors for least squares with a three-year lag truncation are given in Table 5.7 using EViews 6. Note that both standard errors are now larger than those reported by least squares. But once again, this is not necessarily the case for other data sets. Residual ……………………… Actual ——————- Fitted

Table 5.4 Breusch-Godfrey LM Test

 F-statistic Obs*R-squared 168.9023 38.51151 Prob. F(1,46) Prob. Chi-Square(1) 0.0000 0.0000 Test Equation: Dependent Variable: RESID Method: Least Squares Sample 1959 2007 Included observations: 49 Presample missing value lagged residuals set tc > zero Variable Coefficient Std. Error t-Statistic Prob. C -54.41017 102.7650 -0.529462 0.5990 Y 0.003590 0.005335 0.673044 0.5043 RESID(-1) 0.909272 0.069964 12.99624 0.0000 R-squared 0.785949 Mean dependent var -5.34E-13 Adjusted R-squared 0.776643 S. D. dependent var 433.0451 S. E. of regression 204.6601 Akaike info criterion 13.53985 Sum squared resid 1926746. Schwarz criterion 13.65567 Log likelihood -328.7263 Hannan-Quinn criter. 13.58379 F-statistic 84.45113 Durbin-Watson stat 2.116362 Prob(F-statistic) 0.000000

 . prais c y, corc two Iteration 0: rho = 0.0000 Iteration 1: rho = 0.9059 Cochrane-Orcutt AR(1) regression – twostep estimates Source SS df MS Number of obs = 48 F(1, 46) = 519.58 Model 17473195 1 17473195 Prob > F = 0.0000 Residual 1546950.74 46 33629.364 R-squared = 0.9187 Adj R-squared = 0.9169 Total 19020145.7 47 404683.951 Root MSE = 183.38 c Coef. Std. Err. t P > |t| [95% Conf. Interval] y .9892295 .0433981 22.79 0.000 .9018738 1.076585 cons -1579.722 1014.436 -1.56 0.126 -3621.676 462.2328 rho .9059431 Durbin-Watson statistic (original) 0.180503 Durbin-Watson statistic (transformed) 2.457550

 Table 5.6 The Iterative Prais-Winsten AR(1) Regression . prais c y Prais-Winsten AR(1) regression – iterated estimates Source SS df MS Number of obs = 49 F(1, 47) = 119.89 Model 3916565.48 1 3916565.48 Prob > F = 0.0000 Residual 1535401.45 47 32668.1159 R-squared = 0.7184 Adj R-squared = 0.7124 Total 5451966.93 48 113582.644 Root MSE = 180.74 c Coef. Std. Err. t P > |t| [95% Conf. Interval] y .912147 .047007 19.40 0.000 .8175811 1.006713 cons 358.9638 1174.865 0.31 0.761 -2004.56 2722.488 rho .9808528

Durbin-Watson statistic (original) 0.180503 Durbin-Watson statistic (transformed) 2.314703

Dependent Variable: Method: Sample:

Included observations:

Newey-West HAC Standard Errors & Covariance (lag truncation=3)

 Variable Coefficient Std. Error t-Statistic Prob. C -1343.314 422.2947 -3.180987 0.0026 Y 0.979228 0.022434 43.64969 0.0000 R-squared 0.993680 Mean dependent var 16749.10 Adjusted R-squared 0.993545 S. D. dependent var 5447.060 S. E. of regression 437.6277 Akaike info criterion 15.04057 Sum squared resid 9001348. Schwarz criterion 15.11779 Log likelihood -366.4941 Hannan-Quinn criter. 15.06987 F-statistic 7389.281 Durbin-Watson stat 0.180503 Prob(F-statistic) 0.000000

Notes

1. A computational warning is in order when one is applying the Cochrane-Orcutt transformation to cross-section data. Time-series data has a natural ordering which is generally lacking in cross­section data. Therefore, one should be careful in applying the Cochrane-Orcutt transformation to cross-section data since it is not invariant to the ordering of the observations.

2. Another test for serial correlation can be obtained as a by-product of maximum likelihood estima­tion. The maximum likelihood estimator of p has a normal limiting distribution with mean p and variance (1 — p)/T. Hence, one can compute ‘pMLE/[(1 — ‘pMle)/T]1^2and compare it to critical values from the normal distribution.

Problems

1. s2 Is Biased Under Heteroskedasticity. For the simple linear regression with heteroskedasticity, i. e., E(u2) = a2, show that E(s2) is a function of the a2’s?

3. Weighted Least Squares. This is based on Kmenta (1986).

(a) Solve the two equations in (5.11) and show that the solution is given by (5.12).

(b) Show that E”=i(1K2)

E E Е7ЕІЕ ПЕК2)] – E EX/E )]2

 E

n *

_______________ i=i wi______________

Ei=i w*X?)(£i=1 w*) – (£i=1 w*X)  1

where w* = (1/a2) and X * = En=i w*Xi E]Ei’

4. Relative Efficiency of OLS Under Heteroskedasticity. Consider the simple linear regression with heteroskedasticity of the form a2 = a2Xf where Xi = 1, 2,…, 10.

(a) Compute var(eoLS) for 6 = 0.5,1,1.5 and 2.

(b) Compute var(eBLUE) for 6 = 0.5,1,1.5 and 2.

(c) Compute the efficiency of вoLs = var(eBLUE)/var(eOLS) for 6 = 0.5,1,1.5 and 2. What happens to this efficiency measure as 6 increases?

5. Consider the simple regression with only a constant yi = a + ui for i = 1, 2,…,n; where the ufs are independent with mean zero and var(ui) = a2 for i = 1, 2,…,n]_; and var(ui) = a2 for i = ni + 1,. ..,ni + П2 with n = ni + П2.

(a) Derive the OLS estimator of a along with its mean and variance.

(b) Derive the GLS estimator of a along with its mean and variance.

(c) Obtain the relative efficiency of OLS with respect to GLS. Compute their relative efficiency for various values of a2/a2 = 0.2,0.4,0.6, 0.8,1,1.25,1.33, 2.5, 5; and ni/n = 0.2, 0.3,0.4,…, 0.8. Plot this relative efficiency.

(d) Assume that ui is N(0,a2) for i = 1, 2,…,щ_; and N(0,a2) for i = ni + 1,. ..,ni + n2; with ui, s being independent. What is the maximum likelihood estimator of a, a2 and a2?

(e) Derive the LR test for testing H0; a2 = a2 in part (d).

6. Show that for an AR(1) model given in (5.26), E(utus) = plt-sla2u for t, s = 1, 2,…, T.

7. Relative Efficiency of OLS Under the AR(1) Model. This problem is based on Johnston (1984, pp. 310-312). For the simple regression without a constant yt = ext + ut with ut = put-i + et and et ~ IID(0,a2)

These expressions are easier to prove using matrix algebra, see Chapter 9.

(b) Let xt itself follow an AR(1) scheme with parameter A, i. e., xt = Axt—1 + vt, and let T Show that  var((3pw) 1 — p2

lim ^ л л о

T^tx‘ var(f3OLS) (1+ р2 – 2PA)(1 + 2PA + 2Р2 A + …)

(1 – Р2)(1 – РА)

(1 + р2 — 2рА)(1 + рА)

(c) Tabulate this asy eff(eOLS) for various values of p and A where p varies between —0.9 to +0.9 in increments of 0.1, while A varies between 0 and 0.9 in increments of 0.1. What do you conclude? How serious is the loss in efficiency in using OLS rather than the PW procedure?

(d) Ignoring this autocorrelation one would compute аЦ/Еt=i x2 as the var(/3OLS). The differ­ence between this wrong formula and that derived in part (a) gives us the bias in estimating the variance of @OLS. Show that as T ^<x>, this asymptotic proportionate bias is given by —2pA/(1 + pA). Tabulate this asymptotic bias for various values of p and A as in part (c). What do you conclude? How serious is the asymptotic bias of using the wrong variances for eOLS when the disturbances are first-order autocorrelated?

(e)  Show that

Conclude that if p = 0, then E(s2) = аЦ. If xt follows an AR(1) scheme with parameter A, then for a large T, we get

E(s2) = а2и(т — I—A)/(T — 1)

Compute this E(s2) for T = 101 and various values of p and A as in part (c). What do you conclude? How serious is the bias in using s2 as an unbiased estimator for аЦ?

8. OLS Variance Is Biased Under Serial Correlation. For the AR(1) model given in (5.26), show that if p > 0 and the xt’s are positively autocorrelated that E(s2/^xt) understates the var(eOLS) given in (5.34).

9. Show that for the AR(1) model, the Durbin-Watson statistic has plimd ^ 2(1 — p).

10. Regressions with Non-zero Mean Disturbances. Consider the simple regression with a constant

Yi = a + @Xi + Ui i = 1, 2,…,n

where a and в are scalars and ui is independent of the Xi’s. Show that:

(a) If the ui, s are independent and identically gamma distributed with f (ui) = uei — 1e—Ui where ui > 0 and в > 0, then aOLS — s2 is unbiased for a.

(b) If the ui, s are independent and identically x2 distributed with v degrees of freedom, then aOLS — s2/2 is unbiased for a.

(c) If the ui, s are independent and identically exponentially distributed with f(ui) = 1 e—Ui/e where ui > 0 and в > 0, then aOLS — s is consistent for a.

11. The Heteroskedastic Consequences of an Arbitrary Variance for the Initial Disturbance of an AR(1) Model. This is based on Baltagi and Li (1990, 1992). Consider a simple AR(1) model

ut = put-i + et t = 1, 2,…,T p < 1

with et ~ IID(0,ct^) independent of u0 ~ (0,a2/т), and т is an arbitrary positive parameter.

(a) Show that this arbitrary variance on the initial disturbance u0 renders the disturbances, in general, heteroskedastic.

(b) Show that var(ut) = a2 is increasing if т > (1 — p2) and decreasing if т < (1 — p2). When is the process homoskedastic?

(c) Show that cov(ut, ut-s) = psa_s for t > s. Hint: See the solution by Kim (1991).

(d) Consider the simple regression model

yt = Pxt + ut t =1, 2 . ..,T

with ut following the AR(1) process described above. Consider the common case where p > 0 and the xt’s are positively autocorrelated. For this case, it is a standard result that the var(eoLs) is understated under the stationary case (i. e., (1 — p2) = т), see problem 8. This means that OLS rejects too often the hypothesis H0; в = 0. Show that OLS will reject more often than the stationary case if т < 1 — p2 and less often than the stationary case if т > (1 — p2). Hint: See the solution by Koning (1992).

12. ML Estimation of Linear Regression Model with AR(1) Errors and Two Observations. This is based on Magee (1993). Consider the regression model yi = Хів + ui, with only two observations i = 1, 2, and the nonstochastic xi = x2 are scalars. Assume that ui ~ N(0, a2) and u2 = pui + e with p < 1. Also, e ~ N[0, (1 — p2)a2] where e and ui are independent.

(a) Show that the OLS estimator of в is (xiyi + x2y2)/(x2l + x2).

(b) Show that the ML estimator of в is (xiyi — x2y2)/(x2l — x|).

(c) Show that the ML estimator of p is 2xix2/(x2 + x2,) and thus is nonstochastic.

(d) How do the ML estimates of в and p behave as xi ^ x2 and xi ^ —x2? Assume x2 = 0. Hint: See the solution by Baltagi and Li (1995).

13. For the empirical example in section 5.5 based on the Cigarette Consumption Data in Table 3.2. Replicate the OLS regression of logC on logP, log! and a constant. Plot the residuals versus logK and verify Figure 5.1.

Run Glejser’s (1969) test by regressing ei the absolute value of the residuals from part (a), on (log!))6 for S = 1, —1, —0.5 and 0.5. Verify the t-statistics reported in the text.

Run Goldfeld and Quandt’s (1965) test by ordering the observations according to logYj and omitting 12 central observations. Report the two regressions based on the first and last 17 observations and verify the F-test reported in the text.

Verify the Spearman rank correlation test based on the rank (log!)) and rank ei.

Verify Harvey’s (1976) multiplicative heteroskedasticity test based on regressing loge2 on log(log!)).

Run the Breusch and Pagan (1979) test based on the regression of e2/a2 on log!), where

2 46 2

° =£ i=ie2/46.

Run White’s (1980) test for heteroskedasticity.

Run the Jarque and Bera (1987) test for normality of the disturbances.

Compute White’s (1980) heteroskedasticity robust standard errors for the regression in part (a).

14. A Simple Linear Trend Model with AR(1) Disturbances. This is based on Kramer (1982).

(a) Consider the following simple linear trend model

Yt = a + fit + ut

where ut = put-1 + et with p < 1, et ‘ IID(0,a’2) and var(ut) = aU = o/(1 — p2). Our interest is focused on the estimates of the trend coefficient, в, and the estimators to be considered are OLS, CO (assuming that the true value of p is known), the first-difference estimator (FD), and the Generalized Least Squares (GLS), which is Best Linear Unbiased (BLUE) in this case.

In the context of the simple linear trend model, the formulas for the variances of these estimators reduce to

V (OLS) = 12a2{—6pT +1 [(T — 1)p — (T + 1)]2 — (T3 — T )p4

+2(T2 — 1)(T — 3)p3 + 12(T2 + 1)p2 — 2(T2 — 1)(T + 3)p +(T3 — T )}/(1 — p2)(1 — p)4(T3 — T )2

V (CO) = 12a2(1 — p)2(T3 — 3T2 + 2T),

V (FD) = 2a2 (1 — pT-1)/(1 — p2)(T — 1)2,

V (GLS) = 12a2/(T — 1)[(T — 3)(T — 2)p2 — 2(T — 3)(T — 1)p + T (T + 1)].

(b) Compute these variances and their relative efficiency with respect to the GLS estimator for T = 10, 20, 30, 40 and p between —0.9 and 0.9 in 0.1 increments.

(c) For a given T, show that the limit of var(OLS)/var(CO) is zero as p ^ 1. Prove that var(FD) and var(GLS) both tend in the limit to a2t/(T — 1) < to as p ^ 1. Conclude

that var(GLS)/var(FD) tend to 1 as p ^ 1. Also, show that lim[var(GLS)/var(OLS)] =

p^i

5(T2 + T)/6(T2 + 1) < 1 provided T > 3.

(d) For a given p, show that var(FD) = O(T-2) whereas the variance of the remaining estimators is O(T-3). Conclude that lim [var(FD)/var(CO)] = to for any given p.

T

15. Consider the empirical example in section 5.6, based on the Consumption-Income data in Table 5.3. Obtain this data set from the CONSUMP. DAT file on the Springer web site.

(a) Replicate the OLS regression of Ct on Yt and a constant, and compute the Durbin-Watson statistic. Test H0; p = 0 versus H1; p > 0 at the 5% significance level.

(b) Test for first-order serial correlation using the Breusch and Godfrey test.

(c) Perform the two-step Cochrane-Orcutt procedure and verify Table 5.5. What happens if we iterate the Cochrane-Orcutt procedure?

(d) Perform the Prais-Winsten procedure and verify Table 5.6.

(e) Compute the Newey-West heteroskedasticity and autocorrelation-consistent standard errors for the least squares estimates in part (a).

16. Benderly and Zwick (1985) considered the following equation

RSt = a + eQt+i + iPt + ut

where RSt = the real return on stocks in year t, Qt+1 = the annual rate of growth of real GNP in year t +1, and Pt = the rate of inflation in year t. The data is provided on the Springer web site and labeled BENDERLY. ASC. This data covers 31 annual observations for the U. S. over the period 1952-1982. This was obtained from Lott and Ray (1991). This equation is used to test the significance of the inflation rate in explaining real stock returns. Use the sample period 1954-1976 to answer the following questions:

(a) Run OLS to estimate the above equation. Remember to use Qt+i – Is Pt significant in this equation? Plot the residuals against time. Compute the Newey-West heteroskedasticity and autocorrelation-consistent standard errors for these least squares estimates.

(b) Test for serial correlation using the D. W. test.

(c) Would your decision in (b) change if you used the Breusch-Godfrey test for first-order serial correlation?

(d) Run the Cochrane-Orcutt procedure to correct for first-order serial correlation. Report your estimate of p.

(e) Run a Prais-Winsten procedure accounting for the first observation and report your estimate of p. Plot the residuals against time.

17. Using our cross-section Energy/GDP data set in Chapter 3, problem 3.16 consider the following two models:

Model 1: logEn = a + (dlogRGDP + u

Model 2: En = a + [3RGDP + v

Make sure you have corrected the W. Germany observation on EN as described in problem 3.16 part (d).

(a) Run OLS on both Models 1 and 2. Test for heteroskedasticity using the Goldfeldt/Quandt Test. Omit c =6 central observations. Why is heteroskedasticity a problem in Model 2, but not Model 1?

(b) For Model 2, test for heteroskedasticity using the Glejser Test.

(c) Now use the Breusch-Pagan Test to test for heteroskedasticity on Model 2.

(d) Apply White’s Test to Model 2.

(e) Do all these tests give the same decision?

(f) Propose and estimate a simple transformation of Model 2, assuming heteroskedasticity of the form o2 = o2RGDP2.

(g) Propose and estimate a simple transformation of Model 2, assuming heteroskedasticity of the form oj = o2(a + bRGDP)2.

(h) Now suppose that heteroskedasticity is of the form o2 = o2RGDPY where 7 is an unknown parameter. Propose and estimate a simple transformation for Model 2. Hint: You can write o2 as exp{a + ylogRGDP} where a = logo2.

(i) Compare the standard errors of the estimates for Model 2 from OLS, also obtain White’s heteroskedasticity-consistent standard errors. Compare them with the simple Weighted Least Squares estimates of the standard errors in parts (f), (g) and (h). What do you conclude?

18. You are given quarterly data from the first quarter of 1965 (1965.1) to the fourth quarter of 1983 (1983.4) on employment in Orange County California (EMP) and real gross national product (RGNP). The data set is in a file called ORANGE. DAT on the Springer web site.

(a) Generate the lagged variable of real GNP, call it RGNPt-1 and estimate the following model by OLS: EMPt = a + f3RGNPt-1 + ut.

(b) What does inspection of the residuals and the Durbin-Watson statistic suggest?

(c) Assuming ut = put-1 + et where p < 1 and et ~ IIN(0,a2), use the Cochrane-Orcutt procedure to estimate p, a and в. Compare the latter estimates and their standard errors with those of OLS.

(d) The Cochrane-Orcutt procedure omits the first observation. Perform the Prais-Winsten ad­justment. Compare the resulting estimates and standard error with those in part (c).

(e) Apply the Breusch-Godfrey test for first and second order autoregression. What do you conclude?

(f) Compute the Newey-West heteroskedasticity and autocorrelation-consistent covariance stan­dard errors for the least squares estimates in part (a).

19. Consider the earning data underlying the regression in Table 4.1 and available on the Springer web site as EARN. ASC.

(a) Apply White’s test for heteroskedasticity to the regression residuals.

(b) Compute White’s heteroskedasticity-consistent standard errors.

(c) Test the least squares residuals for normality using the Jarque-Bera test.

20. Hedonic Housing. Harrison and Rubinfield (1978) collected data on 506 census tracts in the Boston area in 1970 to study hedonic housing prices and the willingness to pay for clean air. This data is available on the Springer web site as HEDONIC. XLS. The dependent variable is the Median Value (MV) of owner-occupied homes. The regressors include two structural variables, RM the average number of rooms, and AGE representing the proportion of owner units built prior to 1940. In addition there are eight neighborhood variables: B, the proportion of blacks in the population; LSTAT, the proportion of population that is lower status; CRIM, the crime rate; ZN, the proportion of 25000 square feet residential lots; INDUS, the proportion of nonretail business acres; TAX, the full value property tax rate (\$/\$10000); PTRATIO, the pupil-teacher ratio; and CHAS represents the dummy variable for Charles River: = 1 if a tract bounds the Charles. There are also two accessibility variables, DIS the weighted distances to five employment centers in the Boston region, and RAD the index of accessibility to radial highways. One more regressor is an air pollution variable NOX, the annual average nitrogen oxide concentration in parts per hundred million.

(a) Run OLS of MV on the 13 independent variables and a constant. Plot the residuals.

(b) Apply White’s tests for heteroskedasticity.

(c) Obtain the White heteroskedasticity-consistent standard errors.

(d) Test the least squares residuals for normality using the Jarque-Bera test.

21. Agglomeration Economies, Diseconomies, and Growth. Wheeler (2003) uses data on 3106 counties of the contiguous USA to fit a fourth-order polynomial relating County population (employment) growth (over the period 1980 to 1990) as a function of log(size), where size is measured as total resident population or total civilian employment. Other control variables include the proportion of the adult resident population (i. e. of age 25 or older) with a bachelor’s degree or more; the proportion of total employment in manufacturing; and the unemployment rate, all for the year 1980; Per capita income in 1979; the proportion of the resident population belonging to non­white racial categories in 1980, and the share of local government expenditures going to each of three public goods-education, roads and highways, police protection-in 1982. This data can be downloaded from the JAE archive data web site.

(a) Replicate the OLS regressions reported in Tables VIII and IX of Wheeler (2003, pp. 88-89).

(b) Apply White’s and Breusch-Pagan tests for heteroskedasticity.

(c) Test the least squares residuals for normality using the Jarque-Bera test.

References

For additional readings consult the econometrics books cited in the Preface. Also the chapter on het­eroskedasticity by Griffiths (2001), and the chapter on serial correlation by King (2001):

Ali, M. M. and C. Giaccotto (1984), “A study of Several New and Existing Tests for Heteroskedasticity in the General Linear Model,” Journal of Econometrics, 26: 355-373.

Amemiya, T. (1973), “Regression Analysis When the Variance of the Dependent Variable is Proportional to the Square of its Expectation,” Journal of the American Statistical Association, 68: 928-934.

Amemiya, T. (1977), “A Note on a Heteroskedastic Model,” Journal of Econometrics, 6: 365-370.

Andrews, D. W.K. (1991), “Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Esti­mation,” Econometrica, 59: 817-858.

Baltagi, B. and Q. Li (1990), “The Heteroskedastic Consequences of an Arbitrary Variance for the Initial Disturbance of an AR(1) Model,” Econometric Theory, Problem 90.3.1, 6: 405.

Baltagi, B. and Q. Li (1992), “The Bias of the Standard Errors of OLS for an AR(1) process with an Arbitrary Variance on the Initial Observations,” Econometric Theory, Problem 92.1.4, 8: 146.

Baltagi, B. and Q. Li (1995), “ML Estimation of Linear Regression Model with AR(1) Errors and Two Observations,” Econometric Theory, Solution 93.3.2, 11: 641-642.

Bartlett’s test, M. S. (1937), “Properties of Sufficiency and Statistical Tests,” Proceedings of the Royal Statistical Society, A, 160: 268-282.

Beach, C. M. and J. G. MacKinnon (1978), “A Maximum Likelihood Procedure for Regression with Au – tocorrelated Errors,” Econometrica, 46: 51-58.

Benderly, J. and B. Zwick (1985), “Inflation, Real Balances, Output and Real Stock Returns,” American Economic Review, 75: 1115-1123.

Breusch, T. S. (1978), “Testing for Autocorrelation in Dynamic Linear Models,” Australian Economic Papers, 17: 334-355.

Breusch, T. S. and A. R. Pagan (1979), “A Simple Test for Heteroskedasticity and Random Coefficient Variation,” Econometrica, 47: 1287-1294.

Buse, A. (1984), “Tests for Additive Heteroskedasticity: Goldfeld and Quandt Revisited,” Empirical Economics, 9: 199-216.

Carroll, R. H. (1982), “Adapting for Heteroskedasticity in Linear Models,” Annals of Statistics, 10: 1224­1233.

Cochrane, D. and G. Orcutt (1949), “Application of Least Squares Regression to Relationships Contain­ing Autocorrelated Error Terms,” Journal of the American Statistical Association, 44: 32-61.

Cragg, J. G. (1992), “Quasi-Aitken Estimation for Heteroskedasticity of Unknown Form,” Journal of Econometrics, 54: 197-202.

Durbin, J. (1960), “Estimation of Parameters in Time-Series Regression Model,” Journal of the Royal Statistical Society, Series B, 22: 139-153.

Durbin, J. and G. Watson (1950), “Testing for Serial Correlation in Least Squares Regression-I,” Bio- metrika, 37: 409-428.

Durbin, J. and G. Watson (1951), “Testing for Serial Correlation in Least Squares Regression-II,” Biometrika, 38: 159-178.

Evans, M. A., and M. L. King (1980) “A Further Class of Tests for Heteroskedasticity,” Journal of Econo­metrics, 37: 265-276.

Farebrother, R. W. (1980), “The Durbin-Watson Test for Serial Correlation When There is No Intercept in the Regression,” Econometrica, 48: 1553-1563.

Glejser, H. (1969), “A New Test for Heteroskedasticity,” Journal of the American Statistical Association, 64: 316-323.

Godfrey, L. G. (1978), “Testing Against General Autoregressive and Moving Average Error Models When the Regressors Include Lagged Dependent Variables,” Econometrica, 46: 1293-1302.

Goldfeld, S. M. and R. E. Quandt (1965), “Some Tests for Homoscedasticity,” Journal of the American Statistical Association, 60: 539-547.

Goldfeld, S. M. and R. E. Quandt (1972), Nonlinear Methods in Econometrics (North-Holland: Amster­dam).

Griffiths, W. E. (2001), “Heteroskedasticity,” Chapter 4 in B. H. Baltagi, (ed.), A Companion to Theo­retical Econometrics (Blackwell: Massachusetts).

Harrison, M. and B. P. McCabe (1979), “A Test for Heteroskedasticity Based On Ordinary Least Squares Residuals,” Journal of the American Statistical Association, 74: 494-499.

Harrison, D. and D. L. Rubinfeld (1978), “Hedonic Housing Prices and the Demand for Clean Air,” Journal of Environmental Economics and Management, 5: 81-102.

Harvey, A. C. (1976), “Estimating Regression Models With Multiplicative Heteroskedasticity,” Econo- metrica, 44: 461-466.

Hilderth, C. and J. Lu (1960), “Demand Relations with Autocorrelated Disturbances,” Technical Bulletin 276 (Michigan State University, Agriculture Experiment Station).

Jarque, C. M. and A. K. Bera (1987), “A Test for Normality of Observations and Regression Residuals,” International Statistical Review, 55: 163-177.

Kim, J. H. (1991), “The Heteroskedastic Consequences of an Arbitrary Variance for the Initial Distur­bance of an AR(1) Model,” Econometric Theory, Solution 90.3.1, 7: 544-545.

King, M. (2001), “Serial Correlation,” Chapter 2 in B. H. Baltagi, (ed.), A Companion to Theoretical Econometrics (Blackwell: Massachusetts).

Koenker, R. (1981), “A Note on Studentizing a Test for Heteroskedasticity,” Journal of Econometrics, 17: 107-112.

Koenker, R. and G. W. Bassett, Jr. (1982), “Robust Tests for Heteroskedasticity Based on Regression Quantiles,” Econometrica, 50:43-61.

Koning, R. H. (1992), “The Bias of the Standard Errors of OLS for an AR(1) process with an Arbitrary Variance on the Initial Observations,” Econometric Theory, Solution 92.1.4, 9: 149-150.

Kramer, W. (1982), “Note on Estimating Linear Trend When Residuals are Autocorrelated,” Economet – rica, 50: 1065-1067.

Lott, W. F. and S. C. Ray (1992), Applied Econometrics: Problems With Data Sets (The Dryden Press: New York).

Maddala, G. S. (1977), Econometrics (McGraw-Hill: New York).

Maeshiro, A. (1976), “Autoregressive Transformation, Trended Independent Variables and Autocorre­lated Disturbance Terms,” The Review of Economics and Statistics, 58: 497-500.

Maeshiro, A. (1979), “On the Retention of the First Observations in Serial Correlation Adjustment of Regression Models,” International Economic Review, 20: 259-265.

Magee L. (1993), “ML Estimation of Linear Regression Model with AR(1) Errors and Two Observations,” Econometric Theory, Problem 93.3.2, 9: 521-522.

Mizon, G. E. (1995), “A Simple Message for Autocorrelation Correctors: Don’t,” Journal of Econometrics 69: 267-288.

Newey, W. K. and K. D. West (1987), “A Simple, Positive Semi-definite, Heteroskedasticity and Autocor­relation Consistent Covariance Matrix,” Econometrica, 55: 703-708.

Oberhofer, W. and J. Kmenta (1974), “A General Procedure for Obtaining Maximum Likelihood Esti­mates in Generalized Regression Models,” Econometrica, 42: 579-590.

Park, R. E. and B. M. Mitchell (1980), “Estimating the Autocorrelated Error Model With Trended Data,” Journal of Econometrics, 13: 185-201.

Prais, S. and C. Winsten (1954), “Trend Estimation and Serial Correlation,” Discussion Paper 383 (Cowles Commission: Chicago).

Rao, P. and Z. Griliches (1969), “Some Small Sample Properties of Several Two-Stage Regression Meth­ods in the Context of Autocorrelated Errors,” Journal of the American Statistical Association, 64: 253-272.

Rao, P. and R. L. Miller (1971), Applied Econometrics (Wadsworth: Belmont).

Robinson, P. M. (1987), “Asymptotically Efficient Estimation in the Presence of Heteroskedasticity of Unknown Form,” Econometrica, 55: 875-891.

Rutemiller, H. C. and D. A. Bowers (1968), “Estimation in a Heteroskedastic Regression Model,” Journal of the American Statistical Association, 63: 552-557.

Savin, N. E. and K. J. White (1977), “The Durbin-Watson Test for Serial Correlation with Extreme Sample Sizes or Many Regressors,” Econometrica, 45: 1989-1996.

Szroeter, J. (1978), “A Class of Parametric Tests for Heteroskedasticity in Linear Econometric Models,” Econometrica, 46: 1311-1327.

Theil, H. (1978), Introduction to Econometrics (Prentice-Hall: Englewood Cliffs, NJ).

Waldman, D. M. (1983), “A Note on Algebraic Equivalence of White’s Test and a Variation of the Godfrey/Breusch-Pagan Test for Heteroskedasticity,” Economics Letters, 13: 197-200.

Wheeler, C. (2003), “Evidence on Agglomeration Economies, Diseconomies, and Growth,” Journal of Applied Econometrics, 18: 79-104.

White, H. (1980), “A Heteroskedasticity Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity,” Econometrica, 48: 817-838.

Wooldridge, J. M. (1991), “On the Application of Robust, Regression-Based Diagnostics to Models of Conditional Means and Conditional Variances,” Journal of Econometrics, 47: 5-46.

CHAPTER 6