# The W, LR and LM Statistics Revisited

In this section we present a simplified and more general proof of W > LR > LM due to Breusch (1979). For the general linear model given in (9.1) with u ~ N(0, £) and H0; R3 = r. The likelihood function given in (9.14) with £ = a2Q, can be maximized with respect to 3 and £ without imposing H0, yielding the unrestricted estimators 3u and £, where 3u = (X’£-1X)-1 X’£-1y. Similarly, this likelihood can be maximized subject to the restriction H0, yielding 3r and £, where

3dr = (X’ £-1X )-1 X’£-1 y — (X’£-1X )-1R% (9.20)

as in (9.17), where = A-1(R3r — r) is the Lagrange multiplier described in equation (7.35) of Chapter 7 and A = [R(X’£-1X)-1R’]. The major distinction from Chapter 7 is that £ is un­known and has to be estimated. Let 3r denote the unrestricted maximum likelihood estimator of 3 conditional on the restricted variance-covariance estimator £ and let 3u denote the restricted maximum likelihood of 3 (satisfying H0) conditional on the unrestricted variance-covariance estimator £. More explicitly,

Sr = (X’ £-1X)-1 X’£-1 y (9.21)

and

3u = 3u — (X’ £-1X )-1R’A-1(R3u — r) (9.22)

Knowing £, the Likelihood Ratio statistic is given by

LR = —2log[max L(3/£)/max L(3/£)] = —22log[L(3, £)/L(3, £)] (9.23)

Кв=г e

u’£ 1u — u’£ 1S

where и = y — Хв and u = y — Xft, both estimators of в are conditional on a known £.

R£u – N (Ев, R(X ‘£-1X )-1R’) and the Wald statistic is given by

W = (R£u — r)’A-1(R£u — r) where A = [R(X ‘£-1X )-1Rl] (9.24)

Using (9.22), it is easy to show that uu = y — Xeu and uu = y — Xeu are related as follows: Uu = Uu + X (X ‘£-1X )-1R’A-1(R£u — r) (9.25)

and

u’u£-1Uu = UfjE-1Uu + (RPu — r)’A-l(Rfiu — r) (9.26)

The cross-product terms are zero because X’YJ-1Uu = 0. Therefore,

W = U’u £-1Uu — u’u S-1Uu = —2log[L(U,£ )/L0,Z)] (9.27)

= —2log[max L(e/£)/max L(e/£)]

Ев=г в

and the Wald statistic can be interpreted as a LR statistic conditional on £, the unrestricted maximum likelihood estimator of £.

Similarly, the Lagrange multiplier statistic, which tests that ц = 0, is given by

LM = ц’ Ay, = (RPr — r)’ A-1 (RPr — r) (9.28)

Using (9.20) one can easily show that

Ur = Ur + X (Xі U-1X )-1R’A-1(R£r — r) (9.29)

and

-1ur = £(£-1ur + y’ Ay (9.30)

The cross-product terms are zero because X’£ -1ur = 0. Therefore,

LM = u’rU-1Ur — UrU-1£r = —2log[L(Ur,£ )/L(£r, U)] (9.31)

= —2log[max L(e/£)/maxL(e/ £)]

R@=r e

and the Lagrange multiplier statistic can be interpreted as a LR statistic conditional on the restricted maximum likelihood of £. Given that

 max L(в/£ ) < max L(в, £ ) = max L(в/£ ) в в, я в (9.32) max L(в/£) < max L(в, £ ) = max L(в/£) Re=r Re=r, Z Re=r it can be easily shown that the likelihood ratio statistic given by (9.33) LR = —2log[max L(в, £ )/max L(в, £ )] Re=r,~E в, Я satisfies the following inequality (9.34) W>LR>LM (9.35)

The proof is left to the reader, see problem 6.

This general and simple proof holds as long as the maximum likelihood estimator of в is uncorrelated with the maximum likelihood estimator of £, see Breusch (1979).

Unlike time-series, there is typically no unique natural ordering for cross-sectional data. Spatial autocorrelation permit correlation of the disturbance terms across cross-sectional units. There is an extensive literature on spatial models in regional science, urban economics, geography and statistics, see Anselin (1988). Examples in economics usually involve spillover effects or externalities due to geographical proximity. For example, the productivity of public capital, like roads and highways, on the output of neighboring states. Also, the pricing of welfare in one state that pushes recipients to other states. Spatial correlation could relate directly to the model dependent variable y, the exogenous variables X, the disturbance term u, or to a combination of all three. Here we consider spatial correlation in the disturbances and leave the remaining literature on spatial dependence to the motivated reader to pursue in Anselin (1988, 2001) and Anselin and Bera (1998) to mention a few.

For the cross-sectional disturbances, the spatial autocorrelation is specified as

u = XWu + є (9.36)

where A is the spatial autoregressive coefficient satisfying |A| < 1, and є IIN(0,a2). W is a

known spatial weight matrix with diagonal elements equal to zero. W also satisfies some other regularity conditions like the fact that In — AW must be nonsingular.

The regression model given in (9.1) can be written as

y = Хв + (In — AW )-1є (9.37)

with the variance-covariance matrix of the disturbances given by

E = a2Q = a2 (In — AW )-1(In — AW’)-1 (9.38)

Under normality of the disturbances, Ord (1975) derived the maximum likelihood estimators 1 —

lnL = — – ln|Q| — — ln2-ra2 — (y — Хв)’ Q-1 (y — XP)/2a2 (9.39)

The Jacobian term simplifies by using ln|Q| = —2ln|I — AW | = —2Yn=1 ln(1 — Awi)

where wi are the eigenvalues of the spatial weight matrix W. The first-order conditions yield the familiar GLS estimator of в and the associated estimator of a2: в MLE = (X ‘Q 1X) 1X ‘Q 1y and aMLE = eMLEQ ^ eMLE/—       where eMLE = y — XeMLE. An estimate of A can be obtained using the iterative solution of the first-order conditions in Magnus (1978, p. 283):

where дП-1/д = -W – W’ + AW’W

Alternatively, one can substitute /3MLE and a2MLE from (9.41) into the log-likelihood in (9.39) to get the concentrated log-likelihood which will be a nonlinear function of A, see Anselin (1988) for details.   Testing for zero spatial autocorrelation i. e., H0; A = 0 is usually based on the Moran I-test which is similar to the Durbin-Watson statistic in time-series. This is given by

where e denotes the vector of OLS residuals and S0 is a standardization factor equal to the sum of the spatial weights i=1Y^j=iwj. For a row-standardized weights matrix W where each row sums to one, S0 = n and the Moran I-statistic simplifies to e’We/e’e. In practice the test is implemented by standardizing it and using the asymptotic N(0,1) critical values, see Anselin and Bera (1988). In fact, for a row-standardized W matrix, the mean and variance of the Moran I-statistic is obtained from

e’We

E(MI) = E( e’- j = tr(PxW)/(n – k) (9.45)

and tr(Px WPx W’) + tr( Px W )2 + {tr(PX W )}2
(n — k)(n — k + 2)

Alternatively, one can derive the Lagrange Multiplier test for H0; A = 0 using the result that dlnL/dA evaluated under the null of A = 0 is equal to u’Wu/a2 and the fact that the Information matrix is block-diagonal between в and (a2, A), see problem 14. In fact, one can show that

(e’We/a2)
tr[(W’ + W )W ]

with a2 = e’e/n. Under H0, LM is asymptotically distributed as %2. One can clearly see the connection between Moran’s I – statistic and LM. Computationally, the W and LR tests are more demanding since the require ML estimation under spatial autocorrelation.

This is only a brief introduction into the spatial dependence literature. Hopefully, it will moti­vate the reader to explore alternative formulations of spatial dependence, alternative estimation and testing methods discussed in this literature and the numerous applications in economics on hedonic housing, crime rates, police expenditures and R&D spillovers, to mention a few.

Note

1. This section is based on Anselin (1988, 2001) and Anselin and Bera (1998).

Problems

1. GLS Is More Efficient than OLS.

(a) Using equation (7.5) of Chapter 7, verify that var(^OLS) is that given in (9.5).

(b) Show that var(^OLS) — var(@GLS) = o’2AO. A! where

A = [(X’X)-1X’ — (X’O-1X)-1X’O-1].

Conclude that this difference in variances is positive semi-definite.

2. s2 Is No Longer Unbiased for o2.

(a) Show that E(s2) = o2tr(OPX)/(n — K) = o2. Hint: Follow the same proof given below equation (7.6) of Chapter 7, but substitute o2O instead of o2In.

(b) Use the fact that PX and £ are non-negative definite matrices with tr(£PX) > 0 to show that 0 < E(s2) < tr(£)/(n — K) where tr(£) = n=1 o2 with o2 = var(uj) > 0. This bound was derived by Dufour (1986). Under homoskedasticity, show that this bound becomes 0 < E(s2) < no2/(n — K). In general, 0 < {mean of n — K smallest characteristic roots of £} < E(s2) < {mean of n — K largest characteristic roots of £} < tr(£)/(n — K), see Sathe and Vinod (1974) and Neudecker (1977, 1978).

(c) Show that a sufficient condition for s2 to be consistent for o2 irrespective of X is that Amax = the largest characteristic root of O is o(n), i. e., Amax/n ^ 0 as n ^ ж and plim (u’u/n) = o2. Hint: s2 = u’Pxu/(n — K) = u’u/(n — K) — u’PXu/(n — K). By assumption, the first term tends in probability limits to o2 as n ^<x>. The second term has expectation o2tr(PXO)/(n — K). Now PxO has rank K and therefore exactly K non-zero characteristic roots each of which cannot exceed Amax. This means that E[u’PX u/(n—K)] < o2 KAmax / (n— K). Using the condition that Amax/n ^ 0 proves the result. See Kramer and Berghoff (1991).

(d) Using the same reasoning in part (a), show that s*2 given in (9.6) is unbiased for o2.

3. The AR(1) Model. See Kadiyala (1968).

(a) Verify that OO-1 = IT for O and O-1 given in (9.9) and (9.10), respectively.

(b) Show that P-1’P-1 = (1 — p2)O-1 for P-1 defined in (9.11).

(c) Conclude that var(P-1u) = o2€It. Hint: O = (1 — p2)PP’ as can be easily derived from part

(b) .

4. Restricted GLS. Using the derivation of the restricted least squares estimator for u – (0, o2In) in Chapter 7, verify equation (9.17) for the restricted GLS estimator based on u – (0,o2O). Hint: Apply restricted least squares results to the transformed model given in (9.3).

5. Best Linear Unbiased Prediction. This is based on Goldberger (1962). Consider all linear predictors of yT+s = x’T+s@ + uT+s of the form +s = c’y, where u ‘ – (0, £) and £ = o2O.

(a) Show that c! X = x’T+s for yT+s to be unbiased.

(b) Show that var(yT+s) = c’£c + oT+s — 2c! ш where var(uT+s) = oT+s and ш = E(uT+su).

(c) Minimize var(yT+s) given in part (b) subject to c’X = x’T+s and show that

c = £-1[IT — X (X!£-1X )-1X!£-1 ]ш + £-1 X (X!£-1X )-1 xT+s This means that yT+s = c’y = x! T+j3GLS

s = 1, i. e., predicting one period ahead, this verifies equation (9.18). Hint: Use partitioned inverse in solving the first-order minimization equations.

(d) Show that yT+s = х’т+эваьБ + PseT, GLS for the stationary AR(1) disturbances with autore­gressive parameter p, and p < 1.

6. The W, LR and LM Inequality. Using the inequalities given in equations (9.32) and (9.33) verify equation (9.35) which states that W > LR > LM. Hint: Use the conditional likelihood ratio interpretations of W and LM given in equations (9.27) and (9.31) respectively.

7. Consider the simple linear regression

Уі = a + вХі + Ui i = 1, 2,…,n

with ui ~ IIN(0,o2). For H0; в = 0, derive the LR, W and LM statistics in terms of conditional like­lihood ratios as described in Breusch (1979). In other words, compute W = —2 log[max L(a, в/о2)/

Ho

max L(a, в/о2)], LM = —2log[max L(a, в/о2)/max L(a, в/o’2)] and LR = —2log[max L(a, в, о2)/ а, в Ho Ho

max L(a, в, о2)] where <r2 is the unrestricted MLE of a2 while 02 is the restricted MLE of a2 under H0. Use these results to infer that W > LR > LM.

8. Sampling Distributions and Efficiency Comparison of OLS and GLS. Consider the following re­gression model yt = вхг + ut for (t = 1, 2), where в = 2 and xt takes on the fixed values x = 1, x2 = 2. The ufs have the following discrete joint probability distribution:

 (ui, u2) Probability ( —1, —2) 1/8 (1, —2) 3/8 ( —1, 2) 3/8 (1, 2) 1/8

(a) What is the variance-covariance matrix of the disturbances? Are the disturbances het – eroskedastic? Are they correlated?

(b) Find the sampling distributions of вoLS and вGLS and verify that var^OLS) > var^c^).

(c) Find the sampling distribution of the OLS residuals and verify that the estimated var^OLS) is biased. Also, find the sampling distribution of the GLS residuals and verify that the MSE of the GLS regression is an unbiased estimator of the GLS regression variance. Hint: Read Oksanen (1991) and Phillips and Wickens (1978), pp. 3-4. This problem is based on Baltagi (1992). See also the solution by Im and Snow (1993).

9. Equi-correlation. This problem is based on Baltagi (1998). Consider the regression model given in (9.1) with equi-correlated disturbances, i. e., equal variances and equal covariances: E(uu’) = o2Q = a2[(1 — p)IT + piTi’t] where iT is a vector of ones of dimension T and IT is the identity matrix. In this case, var(ut) = о2 and cov(ut, us) = pa2 for t = s with t = 1, 2,…,T. Assume that the regression has a constant.

(a) Show that OLS on this model is equivalent to GLS. Hint: Verify Zyskind’s condition given in (9.8) using the fact that PX iT = iT if iT is a column of X.

(b) Show that E(s2) = a2(1 — p). Also, that Q is positive semi-definite when —1/(T — 1) < p < 1. Conclude that if —1/(T — 1) < p < 1, then 0 < E(s2) < [T/(T — 1)]a2. The lower and upper bounds are attained at p =1 and p = —1/(T — 1), respectively, see Dufour (1986). Hint: Q is positive semi-definite if for every arbitrary non-zero vector a we have a’Qa > 0. What is this expression for a = iT?

(c) Show that for this equi-correlated regression model, the BLUP of yT+ = х’т+1в + uT+i is yT +i = x’T +1вOLS as long as there is a constant in the model.

10. Consider the simple regression with no regressors and equi-correlated disturbances:

yi = a + Ui i = 1,…,n where E(ui) = 0 and

cov(ui, Uj) = pa2 for i = j = a2 for i = j

with < p < 1 for the variance-covariance matrix of the disturbances to be positive definite.

(a) Show that the OLS and GLS estimates of a are identical. This is based on Kruskal (1968).

(b) Show that the bias in s2, the OLS estimator of a2, is given by – pa2.

(c) Show that the GLS estimator of a2 is unbiased.

(d) Show that the E[estimated var(<5) — true var(SOLS)] is also – pa2.

11. Prediction Error Variances Under Heteroskedasticity. This is based on Termayne (1985). Consider the t-th observation of the linear regression model given in (9.1).

yt = x’te + ut t = 1, 2,…,T

where yt is a scalar x’t is 1 x K and в is a K x 1 vector of unknown coefficients. ut is assumed to have zero mean, heteroskedastic variances E(u2) = (z’tу)2 where z’t is a 1 x r vector of observed variables and у is an r x 1 vector of parameters. Furthermore, these ut’s are not serially correlated, so that E(utus) = 0 for t = s. Find the var(eoLs) and var(eGLS) for this model.

Suppose we are forecasting y for period f in the future knowing xf, i. e., yf = xf в + uf with f > T. Let Cf and eCf be the forecast errors derived using OLS and GLS, respectively. Show that the prediction error variances of the point predictions of yf are given by

var(Cf) = Xf ELi xtxt)-1ELi xtx’t(z’tY)2](ELi xtx’tTlxf + (zfY)2

var(ef) = xf ЕГ= і xtx’t(z’tY)2] 1xf + (zfY)2

(c) Show that the variances of the two forecast errors of conditional mean E(yf/xf) based upon eOLS and вGLS and denoted by Cf and Cf, respectively are the first two terms of the corresponding expressions in part (b).

(d) Now assume that K =1 and r = 1 so that there is only one single regressor xt and one zt variable determining the heteroskedasticity. Assume also for simplicity that the empirical moments of xt match the population moments of a Normal random variable with mean zero and variance в. Show that the relative efficiency of the OLS to the GLS predictor of yf is equal to (T +1)/(T + 3), whereas the relative efficiency of the corresponding ratio involving the two predictions of the conditional mean is (1/3).

12. Estimation of Time Series Regressions with Autoregressive Disturbances and Missing Observations. This is based on Baltagi and Wu (1997). Consider the following time series regression model,

yt = xte + ut t = 1,…,T,

where в is a K x 1 vector of regression coefficients including the intercept. The disturbances follow a stationary AR(1) process, that is,

ut = put-1 + et, 1 — p   with p < 1, et is IIN(0,a;2), and u0 ~ N(0,a2e/(1 — p2)). This model is only observed at times tj for j = 1,.. .,n with 1 = ti < … < tn = T and n > K. The typical covariance element of ut for the observed periods tj and ts is given by

Knowing p, derive a simple Prais-Winsten-type transformation that will obtain GLS as a simple least squares regression.

13. Multiplicative Heteroskedasticity. This is based on Harvey (1976). Consider the linear model given in (9.1) and let u ~ N(0, X) where £ = diag[a2]. Assume that a2 = a2hi(6) with в’ = (в1,…, 0s) and hi (в) = exp(e i^ii + … + esZsi) = exp(z’e) with zi = (zu,…, Zsi).

(a) Show that log-likelihood function is given by

 dlog L/de ^i=i hi(e) дв ‘ 2a2 ^i=i (hi(e))2 дв Conclude that for multiplicative heteroskedasticity, equating this score to zero yields

 (yi — xjP)2 exp(zie) ‘  log LOM. o2) = — N log 2TO2 — jEtai log «в> — 2^Sf. i ^

(b) Show that the Information matrix is given by

 1 ^n 1 dhi dhi 2 i=i (hi(в))2 дв дв’ 2a2  X’£-iX 0

(c) Assume that ^(в) satisfies hi(0) = 1, then the test for heteroskedasticity is H0; в = 0 versus Hi; в = 0. Show that the score with respect to в and a2 evaluated under the null hypothesis, i. e., at в = 0 and a2 = e’e/N is given by

1  Y’n z■

i=i zi,

where e denotes the vector of OLS residuals. The Information matrix with respect to в and a2can be obtained from the bottom right block of I(в, в, a2) given in part (b). Conclude that the score test for H0 is given by Ef=i 4(е? – 52) (ENLi(* – -) (Zi – -)’)“ £h Zi(e2 – a2)

2a4

This statistic is asymptotically distributed as x2 under H0. From Chapter 5, we can see that this is a special case of the Breusch and Pagan (1979) test-statistic which can be obtained as one-half the regression sum of squares of e2 ja2 on a constant and Z. Koenker and Bassett (1982) suggested replacing the denominator 2a4 by Zi=1(e2 – a2′)2 jN to make this test more robust to departures from normality.

14. Spatial Autocorrelation. Consider the regression model given in (9.1) with spatial autocorrelation defined in (9.36).

(a) Verify that the first-order conditions of maximization of the log-likelihood function given in (9.39) yield (9.41).

(b) Show that for testing H0; A = 0, the score dlnLjdA evaluated under the null, i. e., at A = 0, is given by u’Wuja2.

(c) Show that the Information matrix with respect to a2and A, evaluated under the null of A = 0, is given by

n tr(W)

2a4 a2 tr(W 2) + tr(W ‘W) Conclude from parts (b) and (c) that the Lagrange Multiplier for H0; A = 0 is given by LM in (9.46). Hint: Use the fact that the diagonal elements of W are zero, hence tr(W) = 0.

15. Neighborhood Effects and Housing Demand. Ioannides and Zabel (2003) use data from the Amer­ican Housing Survey to estimate a model of housing demand with neighborhood effects. The number of observations on housing units used were 1947 in 1985, 2318 in 1989 and 2909 in 1993. The housing survey has detailed information for each of these housing units and their owners, including: the owner’s schooling, whether the owner is white, whether the owner is married, the number of persons in the household, household income, and whether the house has changed owners (“changed hands”) in the last 5 years. In addition, the current owner’s evaluation of the housing unit’s market value, as well as various structural characteristics of the housing unit (such as num­ber of bedrooms, bathrooms, and whether the house has a garage). The variable definitions are given in Table VI of Ioannides and Zabel (2003, p. 568) and the data is available from the Journal of Applied Econometrics archive:

(a) Replicate Table VII of Ioannides and Zabel (2003, p. 569) which displays the means and standard deviations for some of the variables by year and for the pooled sample. Note that the price and income variables are different from the numbers reported in the paper.

(b) Replicate Table VIII of Ioannides and Zabel (2003, p. 577) which reports regression results of housing demand. Note that these regressions are different from those reported in the paper.

References

Additional readings on GLS can be found in the econometric texts cited in the Preface.

Anselin, L. (2001), “Spatial Econometrics,” Chapter 14 in B. H. Baltagi (ed.) A Companion to Theoretical Econometrics (Blackwell: Massachusetts).

Anselin, L. (1988), Spatial Econometrics: Methods and Models (Kluwer: Dordrecht).

Anselin, L. and A. K. Bera (1998), “Spatial Dependence in Linear Regression Models with an Introduction to Spatial Econometrics,” in A. Ullah and D. E.A. Giles (eds.) Handbook of Applied Economic Statistics (Marcel Dekker: New York).

Balestra, P. (1970), “On the Efficiency of Ordinary Least Squares in Regression Models,” Journal of the American Statistical Association, 65: 1330-1337.

Balestra, P. (1980), “A Note on the Exact Transformation Associated with First-Order Moving Average Process,” Journal of Econometrics, 14: 381-394.

Baltagi, B. H. (1989), “Applications of a Necessary and Sufficient Condition for OLS to be BLUE,” Statistics and Probability Letters, 8: 457-461.

Baltagi, B. H. (1992), “Sampling Distributions and Efficiency Comparisons of OLS and GLS in the Presence of Both Serial Correlation and Heteroskedasticity,” Econometric Theory, Problem 92.2.3, 8: 304-305.

Baltagi, B. H. and P. X. Wu (1997), “Estimation of Time Series Regressions with Autoregressive Distur­bances and Missing Observations,” Econometric Theory, Problem 97.5.1, 13: 889.

Baltagi, B. H. (1998), “Prediction in the Equicorrelated Regression Model,” Econometric Theory, Problem

98.3.3, 14: 382.

Breusch, T. S. (1979), “Conflict Among Criteria for Testing Hypotheses: Extensions and Comments,” Econometrica, 47: 203-207.

Breusch, T. S. and A. R. Pagan (1979), “A Simple Test for Heteroskedasticity and Random Coefficient Variation,” Econometrica, 47: 1287-1294.

Buse, A. (1982), “The Likelihood Ratio, Wald, and Lagrange Multiplier Tests: An Expository Note,” The American Statistician, 36: 153-157.

Dufour, J. M. (1986), “Bias of s2 in Linear Regressions with Dependent Errors,” The American Statisti­cian, 40: 284-285.

Fuller, W. A. and G. E. Battese (1974), “Estimation of Linear Models with Crossed-Error Structure,” Journal of Econometrics, 2: 67-78.

Goldberger, A. S. (1962), “Best Linear Unbiased Prediction in the Generalized Linear Regression Model,” Journal of the American Statistical Association, 57: 369-375.

Harvey, A. C. (1976), “Estimating Regression Models With Multiplicative Heteroskedasticity,” Econo – metrica, 44: 461-466.

Im, E. I. and M. S. Snow (1993), “Sampling Distributions and Efficiency Comparisons of OLS and GLS in the Presence of Both Serial Correlation and Heteroskedasticity,” Econometric Theory, Solution

92.2.3, 9: 322-323.

Ioannides, Y. M. and J. E. Zabel (2003), “Neighbourhood Effects and Housing Demand,” Journal of Applied Econometrics 18: 563-584.

Kadiyala, K. R. (1968), “A Transformation Used to Circumvent the Problem of Autocorrelation,” Econo – metrica, 36: 93-96.

Koenker, R. and G. Bassett, Jr. (1982), “Robust Tests for Heteroskedasticity Based on Regression Quan­tiles,” Econometrica, 50: 43-61.

Kramer, W. and S. Berghoff (1991), “Consistency of s2 in the Linear Regression Model with Correlated Errors,” Empirical Economics, 16: 375-377.

Kruskal, W. (1968), “When are Gauss-Markov and Least Squares Estimators Identical? A Coordinate – Free Approach,” The Annals of Mathematical Statistics, 39: 70-75.

Lempers, F. B. and T. Kloek (1973), “On a Simple Transformation for Second-Order Autocorrelated Disturbances in Regression Analysis,” Statistica Neerlandica, 27: 69-75.

Magnus, J. (1978), “Maximum Likelihood Estimation of the GLS Model with Unknown Parameters in the Disturbance Covariance Matrix,” Journal of Econometrics, 7: 281-312.

Milliken, G. A. and M. Albohali (1984), “On Necessary and Sufficient Conditions for Ordinary Least Squares Estimators to be Best Linear Unbiased Estimators,” The American Statistician, 38: 298­299.

Neudecker, H. (1977), “Bounds for the Bias of the Least Squares Estimator of a2 in Case of a First-Order Autoregressive Process (positive autocorrelation),” Econometrica, 45: 1257-1262.

Neudecker, H. (1978), “Bounds for the Bias of the LS Estimator in the Case of a First-Order (positive) Autoregressive Process Where the Regression Contains a Constant Term,” Econometrica, 46: 1223­1226.

Newey, W. and K. West (1987), “A Simple Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix,” Econometrica, 55: 703-708.

Oksanen, E. H. (1991), “A Simple Approach to Teaching Generalized Least Squares Theory,” The Amer­ican Statistician, 45: 229-233.

Ord, J. K. (1975), “ Estimation Methods for Models of Spatial Interaction,” Journal of the American Statistical Association, 70: 120-126.

Phillips, P. C.B. and M. R. Wickens (1978), Exercises in Econometrics, Vol. 1 (Philip Allan/Ballinger: Oxford).

Puntanen S. and G. P.H. Styan (1989), “The Equality of the Ordinary Least Squares Estimator and the Best Linear Unbiased Estimator,” (with discussion), The American Statistician, 43: 153-161.

Sathe, S. T. and H. D. Vinod (1974), “Bounds on the Variance of Regression Coefficients Due to Het – eroskedastic or Autoregressive Errors,” Econometrica, 42: 333-340.

Schmidt, P. (1976), Econometrics (Marcell-Decker: New York).

Termayne, A. R. (1985), “Prediction Error Variances Under Heteroskedasticity,” Econometric Theory, Problem 85.2.3, 1: 293-294.

Theil, H. (1971), Principles of Econometrics (Wiley: New York).

Thomas, J. J and K. F. Wallis (1971), “Seasonal Variation in Regression Analysis,” Journal of the Royal Statistical Society, Series A, 134: 67-72.

White, H. (1980), “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity,” Econometrica, 48: 817-838.

Zyskind, G. (1967), “On Canonical Forms, Non-Negative Covariance Matrices and Best and Simple Least Squares Linear Estimators in Linear Models,” The Annals of Mathematical Statistics, 38:

1092-1109.

CHAPTER 10