Autocorrelation
Violation of assumption 3 means that the disturbances are correlated, i. e., E(щUj) = aj = 0, for i = j, and i, j = 1,2,…,n. Since ui has zero mean, E (uiUj) = cov(ui, Uj) and this is denoted by aij. This correlation is more likely to occur in timeseries than in crosssection studies. Consider estimating the consumption function of a random sample of households. An unexpected event, like a visit of family members will increase the consumption of this household. However, this positive disturbance need not be correlated to the disturbances affecting consumption of other randomly drawn households. However, if we were estimating this consumption function using aggregate timeseries data for the U. S., then it is very likely that a recession year affecting consumption negatively this year may have a carry over effect to the next few years. A shock to the economy like an oil embargo in 1973 is likely to affect the economy for several years. A labor strike this year may affect production for the next few years. Therefore, we will switch the i and j subscripts to t and s denoting timeseries observations t, s = 1,2,…,T and the sample size will be denoted by T rather than n. This covariance term is symmetric, so that a12 = E(uu2) = E(u2u1) = a21. Hence, only T(T — 1)/2 distinct ats’s have to be estimated. For example, if T = 3, then ai2, аіз and a23 are the distinct covariance terms. However, it is hopeless to estimate T(T — 1)/2 covariances (ats) with only T observations. Therefore, more structure on these ats’s need to be imposed. A popular assumption is that the ut’s follow a firstorder autoregressive process denoted by AR(1):
ut — puti + C t — 1, 2,…,T
where et is IID(0,ct2). It is autoregressive because Ut is related to its lagged value Utt. One can also write (5.26), for period t — 1, as
Ut1 = pUt—2 + ett
and substitute (5.27) in (5.26) to get
Ut = p2Ut2 + pett + et (5.28)
Note that the power of p and the subscript of u or e always sum to t. By continuous substitution of this form, one ultimately gets
Ut = ptUo + p tet + .. + pett + et (5.29)
This means that Ut is a function of current and past values of et and U0 where U0 is the initial value of Ut. If U0 has zero mean, then Ut has zero mean. This follows from (5.29) by taking expectations. Also, from (5.26)
var(Ut) = p2var( Utt) + var(et) + 2pcov(Utt, et) (5.30)
Using (5.29), Utt is a function of ett, past values of ett and U0. Since U0 is independent of the e’s, and the e’s are themselves not serially correlated, then Utt is independent of et. This means that cov(Utt, et) = 0. Furthermore, for Ut to be homoskedastic, var(Ut) = var(Utt) = a2u, and (5.30) reduces to a2U = p2a2U + a2, which when solved for a2U gives:
aU = a2/(1 — p2) (5.31)
Hence, U0 ~ (0,a22/(1 — p2)) for the Us to have zero mean and homoskedastic disturbances. Multiplying (5.26) by Utt and taking expected values, one gets
E(UtUtt) = pE (Ut) + E (Uttet) = pa2u (5.32)
since E(U_t) = a2u and E(Uttet) = 0. Therefore, cov(Ut, Utt) = pa2u, and the correlation coefficient between Ut and Utt is correl(Ut, Utt) = cov(Ut, Utt)^Jvar(Ut)var(Utt) = pa2u/a2u = p. Since p is a correlation coefficient, this means that —1 < p < 1. In general, one can show that
cov(Ut, Us)= pltslaU t, s = 1,2,…,T (5.33)
see problem 6. This means that the correlation between Ut and Utr is pr, which is a fraction raised to an integer power, i. e., the correlation is decaying between the disturbances the further apart they are. This is reasonable in economics and may be the reason why this autoregressive form (5.26) is so popular. One should note that this is not the only form that would correlate the disturbances across time. In Chapter 14, we will consider other forms like the Moving Average (MA) process, and higher order Autoregressive Moving Average (ARMA) processes, but these are beyond the scope of this chapter.
Consequences for OLS
How is the OLS estimator affected by the violation of the no autocorrelation assumption among the disturbances? The OLS estimator is still unbiased and consistent since these properties rely on assumptions 1 and 4 and have nothing to do with assumption 3. For the simple linear regression, using (5.2), the variance of Pols is now
var(f3OLS) = varQ]^ wtut) = ELi ELi WtWscov(ut, Us) (5.34)
= Tt=i X2 + WtWsPlt~slvl
t=s
where cov(ut, us) = pltsloU as explained in (5.33). Note that the first term in (5.34) is the usual variance of /3OLS under the classical case. The second term in (5.34) arises because of the correlation between the ut’s. Hence, the variance of OLS computed from a regression package, i. e., s21 ‘t=i xt is a wrong estimate of the variance of POLS for two reasons. First, it is using
the wrong formula for the variance, i. e., oUlYl’t=i x2 rather than (5.34). The latter depends on p through the extra term in (5.34). Second, one can show, see problem 7, that E(s2) = oU and will involve p as well as o2a. Hence, s2 is not unbiased for o2a and s2/Yl’t=i x2 is a biased estimate of var(eOLS). The direction and magnitude of this bias depends on p and the regressor. In fact, if p is positive, and the xt’s are themselves positively autocorrelated, then s2/Yl’t=i x2 understates the true variance of Pols. This means that the confidence interval for в is tighter than it should be and the tstatistic for Ho; в = 0 is overblown, see problem 8. As in the heteroskedastic case, but for completely different reasons, any inference based on var(eOLS) reported from the standard regression packages will be misleading if the ut’s are serially correlated.
Newey and West (1987) suggested a simple heteroskedasticity and autocorrelationconsistent covariance matrix for the OLS estimator without specifying the functional form of the serial correlation. The basic idea extends White’s (1980) replacement of heteroskedastic variances with squared OLS residuals e2 by additionally including products of least squares residuals etets for s = 0, ±1,…, ±p where p is the maximum order of serial correlation we are willing to assume. The consistency of this procedure relies on p being very small relative to the number of observations T. This is consistent with popular serial correlation specifications considered in this chapter where the autocorrelation dies out quickly as j increases. Newey and West (1987) allow the higher order covariance terms to receive diminishing weights. This Newey – West option for the least squares estimator is available using EViews. Andrews (1991) warns about the unreliability of such standard error corrections in some circumstances. Wooldridge (1991) shows that it is possible to construct serially correlated robust Fstatistics for testing joint hypotheses as considered in Chapter 4. However, these are beyond the scope of this book.
Is OLS still BLUE? In order to determine the BLU estimator in this case, we lag the regression equation once, multiply it by p, and subtract it from the original regression equation, we get
Yt – pYti = а(1 – р)+в (Xt – pXti) + et t = 2, 3,…,T (5.35)
This transformation, known as the CochraneOrcutt (1949) transformation, reduces the disturbances to classical errors. Therefore, OLS on th^resulting regression renders the estimates BLU, i. e., run Yt = Yt — pYti on a constant and Xt = Xt — pXti, for t = 2, 3,…,T. Note that we have lost one observation by lagging, and the resulting estimators are BLUE only for linear combinations of (T — 1) observations in Y. i Prais and Winsten (1954) derive the BLU estimators for linear combinations of T observations in Y. This entails recapturing the initial observation as follows: (i) Multiply the first observation of the regression equation by 1 — p[3];
y/1 — P2Yi = ад/1 — p2 + /3/1 — p2Xi + y/1 — P2ui
(ii) add this transformed initial observation to the CochraneOrcutt transformed observations for t = 2,… ,T and run the regression on the T observations rather than the (T —1) observations. See Chapter 9, for a formal proof of this result. Note that
Mi = y/1 — P2Yi
and
Yt = Yt — pYti for t = 2,…,T
Similarly, Xi=y/1 — p2Xi and Xt = Xt — pXti for t = 2,…,T. The constant variable Ct = 1 for t = 1,…,T is now a new variable Ct which takes the values Ci=v 1 — p2 and Ct = (1 — p) for t = 2,…,T. Hence, the PraisWinsten procedure is the regression of Yt on Ct and Xt without a constant. It is obvious that the resulting BLU estimators will involve p and are therefore, different from the usual OLS estimators except in the case where p = 0. Hence, OLS is no longer BLUE. Furthermore, we need to know p in order to obtain the BLU estimators. In applied work, p is not known and has to be estimated, in which case the PraisWinsten regression is no longer BLUE since it is based on an estimate of p rather than the true p itself. However, as long as p is a consistent estimate for p then this is a sufficient condition for the corresponding estimates of a and в in the next step to be asymptotically efficient, see Chapter 9. We now turn to various methods of estimating p.
(1) The CochraneOrcutt (1949) Method: This method starts with an initial estimate of p, the most convenient is 0, and minimizes the residual sum of squares in (5.35). This gives us the OLS estimates of a and в. Then we substitute aOLS and вOLS in (5.35) and we get
et = peti + et t = 2,…,T (5.36)
where et denotes the OLS residual. An estimate of p can be obtained by minimizing the residual sum of squares in (5.36) or running the regression of et on eti without a constant. The resulting estimate of p is pco = Y)Т=2 etet_i^^T=2 ef_i where both summations run over t = 2, 3,…,T. The second step of the CochraneOrcutt procedure (2SCO) is to perform the regression in (5.35) with ‘pco instead of p. One can iterate this procedure (ITCO) by computing new residuals based on the new estimates of a and в and hence a new estimate of p from (5.36), and so on, until convergence. Both the 2SCO and the ITCO are asymptotically efficient, the argument for iterating must be justified in terms of small sample gains.
the minimum residual sum of squares, one can search next between 0.51 and 0.69 in intervals of 0.01. This search procedure guards against a local minimum. Since the likelihood in this case contains p as well as a2 and a and в, this search procedure can be modified to maximize the likelihood rather than minimize the residual sum of squares, since the two criteria are no longer equivalent. The maximum value of the likelihood will give our choice of p and the corresponding estimates of a, в and a2.
(3) Durbin’s (1960) Method: One can rearrange (5.35) by moving Yt1 to the right hand side, i. e.,
Yt = pYti + a(1 – p)+ вХ — peXti + et (5.37)
and running OLS on (5.37). The error in (5.37) is classical, and the presence of Yt1 on the right hand side reminds us of the contemporaneously uncorrelated case discussed under the violation of assumption 4. For that violation, we have shown that unbiasedness is lost, but not consistency. Hence, the estimate of p as a coefficient of Yt1 is biased but consistent. This is the Durbin estimate of p, call it p)D. Next, the second step of the CochraneOrcutt procedure is performed using this estimate of p.
(4) BeachMacKinnon (1978) Maximum Likelihood Procedure: Beach and MacKinnon (1978) derived a cubic equation in p which maximizes the likelihood function concentrated with respect to a, в, and a2. With this estimate of p, denoted by ‘pBM, one performs the Prais – Winsten procedure in the next step.
Correcting for serial correlation is not without its critics. Mizon (1995) argues this point forcefully in his article entitled “A simple message for autocorrelation correctors: Don’t.” The main point being that serial correlation is a symptom of dynamic misspecification which is better represented using a general unrestricted dynamic specification.
Monte Carlo Results
Rao and Griliches (1969) performed a Monte Carlo study using an autoregressive Xt, and various values of p. They found that OLS is still a viable estimator as long as p < 0.3, but if p > 0.3, then it pays to perform procedures that correct for serial correlation based on an estimator of p. Their recommendation was to compute a Durbin’s estimate of p in the first step and to do the PraisWinsten procedure in the second step. Maeshiro (1976, 1979) found that if the Xt series is trended, which is usual with economic data, then OLS outperforms 2SCO, but not the twostep PraisWinsten (2SPW) procedure that recaptures the initial observation. These results were confirmed by Park and Mitchell (1980) who performed an extensive Monte Carlo using trended and untrended Xt’s. Their basic findings include the following: (i) For trended Xt’s, OLS beats 2SCO, ITCO and even a CochraneOrcutt procedure that is based on the true p. However, OLS was beaten by 2SPW, iterative PraisWinsten (ITPW), and Beach – MacKinnon (BM). Their conclusion is that one should not use regressions based on (T — 1) observations as in Cochrane and Orcutt. (ii) Their results find that the ITPW procedure is the recommended estimator beating 2SPW and BM for high values of true p, for both trended as well as nontrended Xt’s. (iii) Test of hypotheses regarding the regression coefficients performed miserably for all estimators based on an estimator of p. The results indicated less bias in standard error estimation for ITPW, BM and 2SPW than OLS. However, the tests based on these standard errors still led to a high probability of type I error for all estimation procedures.
Testing for Autocorrelation
So far, we have studied the properties of OLS under the violation of assumption 3. We have derived asymptotically efficient estimators of the coefficients based on consistent estimators of p and studied their small sample properties using Monte Carlo experiments. Next, we focus on the problem of detecting this autocorrelation between the disturbances. A popular diagnostic for detecting such autocorrelation is the Durbin and Watson (1951) statistic2
d = t=2(et — et)2/ t=i et (5.38)
If this was based on the true ut’s and T was very large then d can be shown to tend in the limit as T gets large to 2(1 — p), see problem 9. This means that if p ^ 0, then d ^ 2; if p ^ 1, then d ^ 0 and if p ^ —1, then d ^ 4. Therefore, a test for H0; p = 0, can be based on whether d is close to 2 or not. Unfortunately, the critical values of d depend upon the Xt’s, and these vary from one data set to another. To get around this, Durbin and Watson established upper (du) and lower (dL) bounds for this critical value. If the observed d is less than dL, or larger than 4 — dL, we reject H0. If the observed d is between du and 4 — du, then we do not reject Ho. If d lies in any of the two indeterminant regions, then one should compute the exact critical values which depend on the data. Most regression packages report the DurbinWatson statistic, but few give the exact pvalue for this dstatistic. If one is interested in a single sided test, say H0; p = 0 versus Hi; p > 0 then one would reject H0 if d < dL, and not reject H0 if d > du. If dL < d < du, then the test is inconclusive. Similarly for testing H0; p = 0 versus Hi; p < 0, one computes (4 — d) and follow the steps for testing against positive autocorrelation. Durbin and Watson tables for dL and du covered samples sizes from 15 to 100 and a maximum of 5 regressors. Savin and White (1977) extended these tables for 6 < T < 200 and up to 10 regressors.
The DurbinWatson statistic has several limitations. We discussed the inconclusive region and the computation of exact critical values. The DurbinWatson statistic is appropriate when there is a constant in the regression. In case there is no constant in the regression, see Farebrother (1980). Also, the DurbinWatson statistic is inappropriate when there are lagged values of the dependent variable among the regressors. We now turn to an alternative test for serial correlation that does not have these limitations and that is also easy to apply. This test was derived by Breusch (1978) and Godfrey (1978) and is known as the BreuschGodfrey test for zero firstorder serial correlation. This is a Lagrange Multiplier test that amounts to running the regression of the OLS residuals et on eti and the original regressors in the model. The test statistic is TR2. Its distribution under the null is xl. In this case, the regressors are a constant and Xt, and the test checks whether the coefficient of eti is significant. The beauty of this test is that (i) it is the same test for firstorder serial correlation, whether the disturbances are Moving Average of order one MA(1) or AR(1). (ii) This test is easily generalizable to higher autoregressive or Moving Average schemes. For secondorder serial correlation, like MA(2) or AR(2) one includes two lags of the residuals on the right hand side; i. e., both eti and et2. (iii) This test is still valid even when lagged values of the dependent variable are present among the regressors, see Chapter 6. The Breusch and Godfrey test is standard using EViews and it prompts the user with a choice of the number of lags of the residuals to include among the regressors to test for serial correlation. You click on residuals, then tests and choose BreuschGodfrey. Next, you input the number of lagged residuals you want to include.
What about first differencing the data as a possible solution for getting rid of serial correlation? Some economic behavioral equations are specified with variables in first difference form, like GDP growth, but other equations are first differenced for estimation purposes. In the latter case, if the original disturbances were not autocorrelated, (or even correlated, with p = 1), then the transformed disturbances are serially correlated. After all, first differencing the disturbances is equivalent to setting p = 1 in ut — put1, and this new disturbance uI = ut — ut1 has ut1 in common with u*—l = uti — ut2, making E(ulu*—1) = — E(uf1) = —a2u. However, one could
argue that if p is large and positive, first differencing the data may not be a bad solution. Rao and Miller (1971) calculated the variance of the BLU estimator correcting for serial correlation, for various guesses of p. They assume a true p of 0.2, and an autoregressive Xt
Xt = Xt1 + wt with A = 0, 0.4, 0.8. (5.39)
They find that OLS (or a guess of p = 0), performs better than first differencing the data, and is pretty close in terms of efficiency to the true BLU estimator for trended Xt (A = 0.8). However, the performance of OLS deteriorates as A declines to 0.4 and 0, with respect to the true BLU estimator. This supports the Monte Carlo finding by Rao and Griliches that for p < 0.3, OLS performs reasonably well relative to estimators that correct for serial correlation. However, the firstdifference estimator, i. e., a guess of p = 1, performs badly for trended Xt (A = 0.8) giving the worst efficiency when compared to any other guess of p. Only when the Xt’s are less trended (A = 0.4) or random (A = 0), does the efficiency of the firstdifference estimator improve. However, even for those cases one can do better by guessing p. For example, for A = 0, one can always do better than first differencing by guessing any positive p less than 1. Similarly, for true p = 0.6, a higher degree of serial correlation, Rao and Miller (1971) show that the performance of OLS deteriorates, while that of the first difference improves. However, one can still do better than first differencing by guessing in the interval (0.4,0.9). This gain in efficiency increases with trended Xt’s.
Empirical Example: Table 5.3 gives the U. S. Real Personal Consumption Expenditures (C) and Real Disposable Personal Income (Y) from the Economic Report of the President over the period 19592007. This data set is available as CONSUMP. DAT on the Springer web site.
The OLS regression yields:
Ct = —1343.31 + 0.979 Yt + residuals (219.56) (0.011)
Figure 5.3 plots the actual, fitted and residuals using EViews 6.0. This shows positive serial correlation with a string of positive residuals followed by a string of negative residuals followed by positive residuals. The DurbinWatson statistic is d = 0.181 which is much smaller than the lower bound d = 1.497 for T = 49 and one regressor. Therefore, we reject the null hypothesis of Ho; p = 0 at the 5% significance level.
The Breusch (1978) and Godfrey (1978) regression that tests for firstorder serial correlation is given in Table 5.4. This is done using EViews 6.0.
This yields
et = —54.41 + 0.004 Yt + 0.909 et1 + residuals (102.77) (0.005) (0.070)
The test statistic is TR2 which yields 49 x (0.786) = 38.5. This is distributed as x2 under H0; p = 0. This rejects the null hypothesis of no first order serial correlation with a pvalue of 0.0000 shown in Table 5.4.
C = Real Personal Consumption Expenditures (in 1987 dollars) Y = Real Disposable Personal Income (in 1987 dollars)
YEAR 
Y 
C 
1959 
8776 
9685 
1960 
8837 
9735 
1961 
8873 
9901 
1962 
9170 
10227 
1963 
9412 
10455 
1964 
9839 
11061 
1965 
10331 
11594 
1966 
10793 
12065 
1967 
10994 
12457 
1968 
11510 
12892 
1969 
11820 
13163 
1970 
11955 
13563 
1971 
12256 
14001 
1972 
12868 
14512 
1973 
13371 
15345 
1974 
13148 
15094 
1975 
13320 
15291 
1976 
13919 
15738 
1977 
14364 
16128 
1978 
14837 
16704 
1979 
15030 
16931 
1980 
14816 
16940 
1981 
14879 
17217 
1982 
14944 
17418 
15656 
17828 
Source: Economic Report of the President 
Regressing the OLS residuals on their lagged values yields
et = 0.906 et1 + residuals (0.062)
The twostep CochraneOrcutt (1949) procedure based on p = 0.906 using Stata 11 yields the results given in Table 5.5.
The PraisWinsten (1954) procedure using Stata 11 yields the results given in Table 5.6. The estimate of the marginal propensity to consume is 0.979 for OLS, 0.989 for twostep Cochrane – Orcutt, and 0.912 for iterative PraisWinsten. All of these estimates are significant.
The NeweyWest heteroskedasticity and autocorrelationconsistent standard errors for least squares with a threeyear lag truncation are given in Table 5.7 using EViews 6. Note that both standard errors are now larger than those reported by least squares. But once again, this is not necessarily the case for other data sets.





. prais c y, 
corc two 

Iteration 0: 
rho = 0.0000 

Iteration 1: 
rho = 0.9059 

CochraneOrcutt AR(1) regression – twostep estimates 

Source 
SS 
df 
MS 
Number of obs = 
48 
F(1, 46) = 
519.58 

Model 
17473195 
1 
17473195 
Prob > F = 
0.0000 
Residual 
1546950.74 
46 
33629.364 
Rsquared = 
0.9187 
Adj Rsquared = 
0.9169 

Total 
19020145.7 
47 
404683.951 
Root MSE = 
183.38 
c 
Coef. 
Std. Err. 
t 
P > t [95% Conf. Interval] 

y 
.9892295 
.0433981 
22.79 
0.000 .9018738 
1.076585 
cons 
1579.722 
1014.436 
1.56 
0.126 3621.676 
462.2328 
rho 
.9059431 

DurbinWatson statistic (original) 0.180503 

Table 5.6 
The Iterative PraisWinsten AR(1) Regression 

. prais c y PraisWinsten AR(1) regression – iterated estimates 

Source 
SS 
df 
MS 
Number of obs = 
49 
F(1, 47) = 
119.89 

Model 
3916565.48 
1 
3916565.48 
Prob > F = 
0.0000 
Residual 
1535401.45 
47 
32668.1159 
Rsquared = 
0.7184 
Adj Rsquared = 
0.7124 

Total 
5451966.93 
48 
113582.644 
Root MSE = 
180.74 
c 
Coef. 
Std. Err. 
t 
P > t [95% Conf. Interval] 

y 
.912147 
.047007 
19.40 
0.000 .8175811 
1.006713 
cons 
358.9638 
1174.865 
0.31 
0.761 2004.56 
2722.488 
rho 
.9808528 
DurbinWatson statistic (original) 0.180503 DurbinWatson statistic (transformed) 2.314703
Dependent Variable: Method:
Sample:
Included observations:
NeweyWest HAC Standard Errors & Covariance (lag truncation=3)

2. Another test for serial correlation can be obtained as a byproduct of maximum likelihood estimation. The maximum likelihood estimator of p has a normal limiting distribution with mean p and variance (1 — p[4])/T. Hence, one can compute ‘pMLE/[(1 — ‘pMle)/T]1^2and compare it to critical values from the normal distribution.
1.
s2 Is Biased Under Heteroskedasticity. For the simple linear regression with heteroskedasticity, i. e., E(u2) = a2, show that E(s2) is a function of the a2’s?
3. Weighted Least Squares. This is based on Kmenta (1986).
(a) Solve the two equations in (5.11) and show that the solution is given by (5.12).
(b) Show that
E”=i(1K2)
E E Е7ЕІЕ ПЕК2)] – E EX/E )]2
E 
n *
_______________ i=i wi______________
Ei=i w*X?)(£i=1 w*) – (£i=1 w*X)
where w* = (1/a2) and X * = En=i w*Xi E]Ei’
4. Relative Efficiency of OLS Under Heteroskedasticity. Consider the simple linear regression with heteroskedasticity of the form a2 = a2Xf where Xi = 1, 2,…, 10.
(a) Compute var(eoLS) for 6 = 0.5,1,1.5 and 2.
(b) Compute var(eBLUE) for 6 = 0.5,1,1.5 and 2.
(c) Compute the efficiency of вoLs = var(eBLUE)/var(eOLS) for 6 = 0.5,1,1.5 and 2. What happens to this efficiency measure as 6 increases?
5. Consider the simple regression with only a constant yi = a + ui for i = 1, 2,…,n; where the ufs are independent with mean zero and var(ui) = a2 for i = 1, 2,…,n]_; and var(ui) = a2 for i = ni + 1,. ..,ni + П2 with n = ni + П2.
(a) Derive the OLS estimator of a along with its mean and variance.
(b) Derive the GLS estimator of a along with its mean and variance.
(c) Obtain the relative efficiency of OLS with respect to GLS. Compute their relative efficiency for various values of a2/a2 = 0.2,0.4,0.6, 0.8,1,1.25,1.33, 2.5, 5; and ni/n = 0.2, 0.3,0.4,…, 0.8. Plot this relative efficiency.
(d) Assume that ui is N(0,a2) for i = 1, 2,…,щ_; and N(0,a2) for i = ni + 1,. ..,ni + n2; with ui, s being independent. What is the maximum likelihood estimator of a, a2 and a2?
(e) Derive the LR test for testing H0; a2 = a2 in part (d).
6. Show that for an AR(1) model given in (5.26), E(utus) = pltsla2u for t, s = 1, 2,…, T.
7.
Relative Efficiency of OLS Under the AR(1) Model. This problem is based on Johnston (1984, pp. 310312). For the simple regression without a constant yt = ext + ut with ut = puti + et and et ~ IID(0,a2)
These expressions are easier to prove using matrix algebra, see Chapter 9.
(b) Let xt itself follow an AR(1) scheme with parameter A, i. e., xt = Axt—1 + vt, and let T Show that
var((3pw) 1 — p2
lim ^ л л о
T^tx‘ var(f3OLS) (1+ р2 – 2PA)(1 + 2PA + 2Р2 A + …)
(1 – Р2)(1 – РА)
(1 + р2 — 2рА)(1 + рА)
(c) Tabulate this asy eff(eOLS) for various values of p and A where p varies between —0.9 to +0.9 in increments of 0.1, while A varies between 0 and 0.9 in increments of 0.1. What do you conclude? How serious is the loss in efficiency in using OLS rather than the PW procedure?
(d) Ignoring this autocorrelation one would compute аЦ/Еt=i x2 as the var(/3OLS). The difference between this wrong formula and that derived in part (a) gives us the bias in estimating the variance of @OLS. Show that as T ^<x>, this asymptotic proportionate bias is given by —2pA/(1 + pA). Tabulate this asymptotic bias for various values of p and A as in part (c). What do you conclude? How serious is the asymptotic bias of using the wrong variances for eOLS when the disturbances are firstorder autocorrelated?
(e)
Show that
Conclude that if p = 0, then E(s2) = аЦ. If xt follows an AR(1) scheme with parameter A, then for a large T, we get
E(s2) = а2и(т — I—A)/(T — 1)
Compute this E(s2) for T = 101 and various values of p and A as in part (c). What do you conclude? How serious is the bias in using s2 as an unbiased estimator for аЦ?
8. OLS Variance Is Biased Under Serial Correlation. For the AR(1) model given in (5.26), show that if p > 0 and the xt’s are positively autocorrelated that E(s2/^xt) understates the var(eOLS) given in (5.34).
9. Show that for the AR(1) model, the DurbinWatson statistic has plimd ^ 2(1 — p).
10. Regressions with Nonzero Mean Disturbances. Consider the simple regression with a constant
Yi = a + @Xi + Ui i = 1, 2,…,n
where a and в are scalars and ui is independent of the Xi’s. Show that:
(a) If the ui, s are independent and identically gamma distributed with f (ui) = uei — 1e—Ui where ui > 0 and в > 0, then aOLS — s2 is unbiased for a.
(b) If the ui, s are independent and identically x2 distributed with v degrees of freedom, then aOLS — s2/2 is unbiased for a.
(c) If the ui, s are independent and identically exponentially distributed with f(ui) = 1 e—Ui/e where ui > 0 and в > 0, then aOLS — s is consistent for a.
11. The Heteroskedastic Consequences of an Arbitrary Variance for the Initial Disturbance of an AR(1) Model. This is based on Baltagi and Li (1990, 1992). Consider a simple AR(1) model
ut = puti + et t = 1, 2,…,T p < 1
with et ~ IID(0,ct^) independent of u0 ~ (0,a2/т), and т is an arbitrary positive parameter.
(a) Show that this arbitrary variance on the initial disturbance u0 renders the disturbances, in general, heteroskedastic.
(b) Show that var(ut) = a2 is increasing if т > (1 — p2) and decreasing if т < (1 — p2). When is the process homoskedastic?
(c) Show that cov(ut, uts) = psa_s for t > s. Hint: See the solution by Kim (1991).
(d) Consider the simple regression model
yt = Pxt + ut t =1, 2 . ..,T
with ut following the AR(1) process described above. Consider the common case where p > 0 and the xt’s are positively autocorrelated. For this case, it is a standard result that the var(eoLs) is understated under the stationary case (i. e., (1 — p2) = т), see problem 8. This means that OLS rejects too often the hypothesis H0; в = 0. Show that OLS will reject more often than the stationary case if т < 1 — p2 and less often than the stationary case if т > (1 — p2). Hint: See the solution by Koning (1992).
12. ML Estimation of Linear Regression Model with AR(1) Errors and Two Observations. This is based on Magee (1993). Consider the regression model yi = Хів + ui, with only two observations i = 1, 2, and the nonstochastic xi = x2 are scalars. Assume that ui ~ N(0, a2) and u2 = pui + e with p < 1. Also, e ~ N[0, (1 — p2)a2] where e and ui are independent.
(a) Show that the OLS estimator of в is (xiyi + x2y2)/(x2l + x2).
(b) Show that the ML estimator of в is (xiyi — x2y2)/(x2l — x).
(c) Show that the ML estimator of p is 2xix2/(x2 + x2,) and thus is nonstochastic.
(d) How do the ML estimates of в and p behave as xi ^ x2 and xi ^ —x2? Assume x2 = 0. Hint: See the solution by Baltagi and Li (1995).
13. For the empirical example in section 5.5 based on the Cigarette Consumption Data in Table 3.2.
Replicate the OLS regression of logC on logP, log! and a constant. Plot the residuals versus logK and verify Figure 5.1.
Run Glejser’s (1969) test by regressing ei the absolute value of the residuals from part (a), on (log!))6 for S = 1, —1, —0.5 and 0.5. Verify the tstatistics reported in the text.
Run Goldfeld and Quandt’s (1965) test by ordering the observations according to logYj and omitting 12 central observations. Report the two regressions based on the first and last 17 observations and verify the Ftest reported in the text.
Verify the Spearman rank correlation test based on the rank (log!)) and rank ei.
Verify Harvey’s (1976) multiplicative heteroskedasticity test based on regressing loge2 on log(log!)).
Run the Breusch and Pagan (1979) test based on the regression of e2/a2 on log!), where
2 46 2
° =£ i=ie2/46.
Run White’s (1980) test for heteroskedasticity.
Run the Jarque and Bera (1987) test for normality of the disturbances.
Compute White’s (1980) heteroskedasticity robust standard errors for the regression in part (a).
14. A Simple Linear Trend Model with AR(1) Disturbances. This is based on Kramer (1982).
(a) Consider the following simple linear trend model
Yt = a + fit + ut
where ut = put1 + et with p < 1, et ‘ IID(0,a’2) and var(ut) = aU = o/(1 — p2). Our interest is focused on the estimates of the trend coefficient, в, and the estimators to be considered are OLS, CO (assuming that the true value of p is known), the firstdifference estimator (FD), and the Generalized Least Squares (GLS), which is Best Linear Unbiased (BLUE) in this case.
In the context of the simple linear trend model, the formulas for the variances of these estimators reduce to
V (OLS) = 12a2{—6pT +1 [(T — 1)p — (T + 1)]2 — (T3 — T )p4
+2(T2 — 1)(T — 3)p3 + 12(T2 + 1)p2 — 2(T2 — 1)(T + 3)p +(T3 — T )}/(1 — p2)(1 — p)4(T3 — T )2
V (CO) = 12a2(1 — p)2(T3 — 3T2 + 2T),
V (FD) = 2a2 (1 — pT1)/(1 — p2)(T — 1)2,
V (GLS) = 12a2/(T — 1)[(T — 3)(T — 2)p2 — 2(T — 3)(T — 1)p + T (T + 1)].
(b) Compute these variances and their relative efficiency with respect to the GLS estimator for T = 10, 20, 30, 40 and p between —0.9 and 0.9 in 0.1 increments.
(c) For a given T, show that the limit of var(OLS)/var(CO) is zero as p ^ 1. Prove that var(FD) and var(GLS) both tend in the limit to a2t/(T — 1) < to as p ^ 1. Conclude
that var(GLS)/var(FD) tend to 1 as p ^ 1. Also, show that lim[var(GLS)/var(OLS)] =
p^i
5(T2 + T)/6(T2 + 1) < 1 provided T > 3.
(d) For a given p, show that var(FD) = O(T2) whereas the variance of the remaining estimators is O(T3). Conclude that lim [var(FD)/var(CO)] = to for any given p.
T
15. Consider the empirical example in section 5.6, based on the ConsumptionIncome data in Table 5.3. Obtain this data set from the CONSUMP. DAT file on the Springer web site.
(a) Replicate the OLS regression of Ct on Yt and a constant, and compute the DurbinWatson statistic. Test H0; p = 0 versus H1; p > 0 at the 5% significance level.
(b) Test for firstorder serial correlation using the Breusch and Godfrey test.
(c) Perform the twostep CochraneOrcutt procedure and verify Table 5.5. What happens if we iterate the CochraneOrcutt procedure?
(d) Perform the PraisWinsten procedure and verify Table 5.6.
(e) Compute the NeweyWest heteroskedasticity and autocorrelationconsistent standard errors for the least squares estimates in part (a).
16. Benderly and Zwick (1985) considered the following equation
RSt = a + eQt+i + iPt + ut
where RSt = the real return on stocks in year t, Qt+1 = the annual rate of growth of real GNP in year t +1, and Pt = the rate of inflation in year t. The data is provided on the Springer web site and labeled BENDERLY. ASC. This data covers 31 annual observations for the U. S. over the period 19521982. This was obtained from Lott and Ray (1991). This equation is used to test the significance of the inflation rate in explaining real stock returns. Use the sample period 19541976 to answer the following questions:
(a) Run OLS to estimate the above equation. Remember to use Qt+i – Is Pt significant in this equation? Plot the residuals against time. Compute the NeweyWest heteroskedasticity and autocorrelationconsistent standard errors for these least squares estimates.
(b) Test for serial correlation using the D. W. test.
(c) Would your decision in (b) change if you used the BreuschGodfrey test for firstorder serial correlation?
(d) Run the CochraneOrcutt procedure to correct for firstorder serial correlation. Report your estimate of p.
(e) Run a PraisWinsten procedure accounting for the first observation and report your estimate of p. Plot the residuals against time.
17. Using our crosssection Energy/GDP data set in Chapter 3, problem 3.16 consider the following two models:
Model 1: logEn = a + (dlogRGDP + u
Model 2: En = a + [3RGDP + v
Make sure you have corrected the W. Germany observation on EN as described in problem 3.16 part (d).
(a) Run OLS on both Models 1 and 2. Test for heteroskedasticity using the Goldfeldt/Quandt Test. Omit c =6 central observations. Why is heteroskedasticity a problem in Model 2, but not Model 1?
(b) For Model 2, test for heteroskedasticity using the Glejser Test.
(c) Now use the BreuschPagan Test to test for heteroskedasticity on Model 2.
(d) Apply White’s Test to Model 2.
(e) Do all these tests give the same decision?
(f) Propose and estimate a simple transformation of Model 2, assuming heteroskedasticity of the form o2 = o2RGDP2.
(g) Propose and estimate a simple transformation of Model 2, assuming heteroskedasticity of the form oj = o2(a + bRGDP)2.
(h) Now suppose that heteroskedasticity is of the form o2 = o2RGDPY where 7 is an unknown parameter. Propose and estimate a simple transformation for Model 2. Hint: You can write o2 as exp{a + ylogRGDP} where a = logo2.
(i) Compare the standard errors of the estimates for Model 2 from OLS, also obtain White’s heteroskedasticityconsistent standard errors. Compare them with the simple Weighted Least Squares estimates of the standard errors in parts (f), (g) and (h). What do you conclude?
18. You are given quarterly data from the first quarter of 1965 (1965.1) to the fourth quarter of 1983 (1983.4) on employment in Orange County California (EMP) and real gross national product (RGNP). The data set is in a file called ORANGE. DAT on the Springer web site.
(a) Generate the lagged variable of real GNP, call it RGNPt1 and estimate the following model by OLS: EMPt = a + f3RGNPt1 + ut.
(b) What does inspection of the residuals and the DurbinWatson statistic suggest?
(c) Assuming ut = put1 + et where p < 1 and et ~ IIN(0,a2), use the CochraneOrcutt procedure to estimate p, a and в. Compare the latter estimates and their standard errors with those of OLS.
(d) The CochraneOrcutt procedure omits the first observation. Perform the PraisWinsten adjustment. Compare the resulting estimates and standard error with those in part (c).
(e) Apply the BreuschGodfrey test for first and second order autoregression. What do you conclude?
(f) Compute the NeweyWest heteroskedasticity and autocorrelationconsistent covariance standard errors for the least squares estimates in part (a).
19. Consider the earning data underlying the regression in Table 4.1 and available on the Springer web site as EARN. ASC.
(a) Apply White’s test for heteroskedasticity to the regression residuals.
(b) Compute White’s heteroskedasticityconsistent standard errors.
(c) Test the least squares residuals for normality using the JarqueBera test.
20. Hedonic Housing. Harrison and Rubinfield (1978) collected data on 506 census tracts in the Boston area in 1970 to study hedonic housing prices and the willingness to pay for clean air. This data is available on the Springer web site as HEDONIC. XLS. The dependent variable is the Median Value (MV) of owneroccupied homes. The regressors include two structural variables, RM the average number of rooms, and AGE representing the proportion of owner units built prior to 1940. In addition there are eight neighborhood variables: B, the proportion of blacks in the population; LSTAT, the proportion of population that is lower status; CRIM, the crime rate; ZN, the proportion of 25000 square feet residential lots; INDUS, the proportion of nonretail business acres; TAX, the full value property tax rate ($/$10000); PTRATIO, the pupilteacher ratio; and CHAS represents the dummy variable for Charles River: = 1 if a tract bounds the Charles. There are also two accessibility variables, DIS the weighted distances to five employment centers in the Boston region, and RAD the index of accessibility to radial highways. One more regressor is an air pollution variable NOX, the annual average nitrogen oxide concentration in parts per hundred million.
(a) Run OLS of MV on the 13 independent variables and a constant. Plot the residuals.
(b) Apply White’s tests for heteroskedasticity.
(c) Obtain the White heteroskedasticityconsistent standard errors.
(d) Test the least squares residuals for normality using the JarqueBera test.
21. Agglomeration Economies, Diseconomies, and Growth. Wheeler (2003) uses data on 3106 counties of the contiguous USA to fit a fourthorder polynomial relating County population (employment) growth (over the period 1980 to 1990) as a function of log(size), where size is measured as total resident population or total civilian employment. Other control variables include the proportion of the adult resident population (i. e. of age 25 or older) with a bachelor’s degree or more; the proportion of total employment in manufacturing; and the unemployment rate, all for the year 1980; Per capita income in 1979; the proportion of the resident population belonging to nonwhite racial categories in 1980, and the share of local government expenditures going to each of three public goodseducation, roads and highways, police protectionin 1982. This data can be downloaded from the JAE archive data web site.
(a) Replicate the OLS regressions reported in Tables VIII and IX of Wheeler (2003, pp. 8889).
(b) Apply White’s and BreuschPagan tests for heteroskedasticity.
(c) Test the least squares residuals for normality using the JarqueBera test.
For additional readings consult the econometrics books cited in the Preface. Also the chapter on heteroskedasticity by Griffiths (2001), and the chapter on serial correlation by King (2001):
Ali, M. M. and C. Giaccotto (1984), “A study of Several New and Existing Tests for Heteroskedasticity in the General Linear Model,” Journal of Econometrics, 26: 355373.
Amemiya, T. (1973), “Regression Analysis When the Variance of the Dependent Variable is Proportional to the Square of its Expectation,” Journal of the American Statistical Association, 68: 928934.
Amemiya, T. (1977), “A Note on a Heteroskedastic Model,” Journal of Econometrics, 6: 365370.
Andrews, D. W.K. (1991), “Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation,” Econometrica, 59: 817858.
Baltagi, B. and Q. Li (1990), “The Heteroskedastic Consequences of an Arbitrary Variance for the Initial Disturbance of an AR(1) Model,” Econometric Theory, Problem 90.3.1, 6: 405.
Baltagi, B. and Q. Li (1992), “The Bias of the Standard Errors of OLS for an AR(1) process with an Arbitrary Variance on the Initial Observations,” Econometric Theory, Problem 92.1.4, 8: 146.
Baltagi, B. and Q. Li (1995), “ML Estimation of Linear Regression Model with AR(1) Errors and Two Observations,” Econometric Theory, Solution 93.3.2, 11: 641642.
Bartlett’s test, M. S. (1937), “Properties of Sufficiency and Statistical Tests,” Proceedings of the Royal Statistical Society, A, 160: 268282.
Beach, C. M. and J. G. MacKinnon (1978), “A Maximum Likelihood Procedure for Regression with Au – tocorrelated Errors,” Econometrica, 46: 5158.
Benderly, J. and B. Zwick (1985), “Inflation, Real Balances, Output and Real Stock Returns,” American Economic Review, 75: 11151123.
Breusch, T. S. (1978), “Testing for Autocorrelation in Dynamic Linear Models,” Australian Economic Papers, 17: 334355.
Breusch, T. S. and A. R. Pagan (1979), “A Simple Test for Heteroskedasticity and Random Coefficient Variation,” Econometrica, 47: 12871294.
Buse, A. (1984), “Tests for Additive Heteroskedasticity: Goldfeld and Quandt Revisited,” Empirical Economics, 9: 199216.
Carroll, R. H. (1982), “Adapting for Heteroskedasticity in Linear Models,” Annals of Statistics, 10: 12241233.
Cochrane, D. and G. Orcutt (1949), “Application of Least Squares Regression to Relationships Containing Autocorrelated Error Terms,” Journal of the American Statistical Association, 44: 3261.
Cragg, J. G. (1992), “QuasiAitken Estimation for Heteroskedasticity of Unknown Form,” Journal of Econometrics, 54: 197202.
Durbin, J. (1960), “Estimation of Parameters in TimeSeries Regression Model,” Journal of the Royal Statistical Society, Series B, 22: 139153.
Durbin, J. and G. Watson (1950), “Testing for Serial Correlation in Least Squares RegressionI,” Bio metrika, 37: 409428.
Durbin, J. and G. Watson (1951), “Testing for Serial Correlation in Least Squares RegressionII,” Biometrika, 38: 159178.
Evans, M. A., and M. L. King (1980) “A Further Class of Tests for Heteroskedasticity,” Journal of Econometrics, 37: 265276.
Farebrother, R. W. (1980), “The DurbinWatson Test for Serial Correlation When There is No Intercept in the Regression,” Econometrica, 48: 15531563.
Glejser, H. (1969), “A New Test for Heteroskedasticity,” Journal of the American Statistical Association, 64: 316323.
Godfrey, L. G. (1978), “Testing Against General Autoregressive and Moving Average Error Models When the Regressors Include Lagged Dependent Variables,” Econometrica, 46: 12931302.
Goldfeld, S. M. and R. E. Quandt (1965), “Some Tests for Homoscedasticity,” Journal of the American Statistical Association, 60: 539547.
Goldfeld, S. M. and R. E. Quandt (1972), Nonlinear Methods in Econometrics (NorthHolland: Amsterdam).
Griffiths, W. E. (2001), “Heteroskedasticity,” Chapter 4 in B. H. Baltagi, (ed.), A Companion to Theoretical Econometrics (Blackwell: Massachusetts).
Harrison, M. and B. P. McCabe (1979), “A Test for Heteroskedasticity Based On Ordinary Least Squares Residuals,” Journal of the American Statistical Association, 74: 494499.
Harrison, D. and D. L. Rubinfeld (1978), “Hedonic Housing Prices and the Demand for Clean Air,” Journal of Environmental Economics and Management, 5: 81102.
Harvey, A. C. (1976), “Estimating Regression Models With Multiplicative Heteroskedasticity,” Econo metrica, 44: 461466.
Hilderth, C. and J. Lu (1960), “Demand Relations with Autocorrelated Disturbances,” Technical Bulletin 276 (Michigan State University, Agriculture Experiment Station).
Jarque, C. M. and A. K. Bera (1987), “A Test for Normality of Observations and Regression Residuals,” International Statistical Review, 55: 163177.
Kim, J. H. (1991), “The Heteroskedastic Consequences of an Arbitrary Variance for the Initial Disturbance of an AR(1) Model,” Econometric Theory, Solution 90.3.1, 7: 544545.
King, M. (2001), “Serial Correlation,” Chapter 2 in B. H. Baltagi, (ed.), A Companion to Theoretical Econometrics (Blackwell: Massachusetts).
Koenker, R. (1981), “A Note on Studentizing a Test for Heteroskedasticity,” Journal of Econometrics, 17: 107112.
Koenker, R. and G. W. Bassett, Jr. (1982), “Robust Tests for Heteroskedasticity Based on Regression Quantiles,” Econometrica, 50:4361.
Koning, R. H. (1992), “The Bias of the Standard Errors of OLS for an AR(1) process with an Arbitrary Variance on the Initial Observations,” Econometric Theory, Solution 92.1.4, 9: 149150.
Kramer, W. (1982), “Note on Estimating Linear Trend When Residuals are Autocorrelated,” Economet – rica, 50: 10651067.
Lott, W. F. and S. C. Ray (1992), Applied Econometrics: Problems With Data Sets (The Dryden Press: New York).
Maddala, G. S. (1977), Econometrics (McGrawHill: New York).
Maeshiro, A. (1976), “Autoregressive Transformation, Trended Independent Variables and Autocorrelated Disturbance Terms,” The Review of Economics and Statistics, 58: 497500.
Maeshiro, A. (1979), “On the Retention of the First Observations in Serial Correlation Adjustment of Regression Models,” International Economic Review, 20: 259265.
Magee L. (1993), “ML Estimation of Linear Regression Model with AR(1) Errors and Two Observations,” Econometric Theory, Problem 93.3.2, 9: 521522.
Mizon, G. E. (1995), “A Simple Message for Autocorrelation Correctors: Don’t,” Journal of Econometrics 69: 267288.
Newey, W. K. and K. D. West (1987), “A Simple, Positive Semidefinite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix,” Econometrica, 55: 703708.
Oberhofer, W. and J. Kmenta (1974), “A General Procedure for Obtaining Maximum Likelihood Estimates in Generalized Regression Models,” Econometrica, 42: 579590.
Park, R. E. and B. M. Mitchell (1980), “Estimating the Autocorrelated Error Model With Trended Data,” Journal of Econometrics, 13: 185201.
Prais, S. and C. Winsten (1954), “Trend Estimation and Serial Correlation,” Discussion Paper 383 (Cowles Commission: Chicago).
Rao, P. and Z. Griliches (1969), “Some Small Sample Properties of Several TwoStage Regression Methods in the Context of Autocorrelated Errors,” Journal of the American Statistical Association, 64: 253272.
Rao, P. and R. L. Miller (1971), Applied Econometrics (Wadsworth: Belmont).
Robinson, P. M. (1987), “Asymptotically Efficient Estimation in the Presence of Heteroskedasticity of Unknown Form,” Econometrica, 55: 875891.
Rutemiller, H. C. and D. A. Bowers (1968), “Estimation in a Heteroskedastic Regression Model,” Journal of the American Statistical Association, 63: 552557.
Savin, N. E. and K. J. White (1977), “The DurbinWatson Test for Serial Correlation with Extreme Sample Sizes or Many Regressors,” Econometrica, 45: 19891996.
Szroeter, J. (1978), “A Class of Parametric Tests for Heteroskedasticity in Linear Econometric Models,” Econometrica, 46: 13111327.
Theil, H. (1978), Introduction to Econometrics (PrenticeHall: Englewood Cliffs, NJ).
Waldman, D. M. (1983), “A Note on Algebraic Equivalence of White’s Test and a Variation of the Godfrey/BreuschPagan Test for Heteroskedasticity,” Economics Letters, 13: 197200.
Wheeler, C. (2003), “Evidence on Agglomeration Economies, Diseconomies, and Growth,” Journal of Applied Econometrics, 18: 79104.
White, H. (1980), “A Heteroskedasticity Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity,” Econometrica, 48: 817838.
Wooldridge, J. M. (1991), “On the Application of Robust, RegressionBased Diagnostics to Models of Conditional Means and Conditional Variances,” Journal of Econometrics, 47: 546.
CHAPTER 6
Leave a reply