The Neyman-Pearson lemma and the Durbin-Watson test
The first formal specification test in econometrics, the Durbin-Watson (DW) (1950) test for autocorrelation in the regression model has its foundation in the UMP test principle via a theorem of Anderson (1948). Most econometrics textbooks provide a detail discussion of the DW test but do not mention its origin. Let us consider the standard linear regression model with autocorrelated errors:
Vt = 4е + et (2.54)
£t = pet-1 + Ut, (2.55)
where Vt is the tth observation on the dependent variable, xt is the tth observation on k strictly exogenous variables, | p | < 1 and ut ~ iidN(0, о2), t = 1, 2,…, n. The problem is testing H0 : p = 0. Using the N-P lemma, Anderson (1948) showed that UMP tests for serial correlation can be obtained against one-sided alternatives.7 A special case of Anderson’s lemma is as follows:
If the probability density of e can be written in the form,
where e = (e 1, e 2,……. en)’ and columns of X = (x1, x2,…, xn) are generated
by k eigen-vectors of D, the UMP test of H0 : p = 0 against H1 : p > 0 is given by a > a0, where
and a0 is such that Pr[a > a0| p = 0] = a, the size of the test. Here є = (єа, є2,…, єn)’ with et = yt – xtS and S is the ordinary least squares (OLS) residual vector.
For the model given in (2.54) and (2.55), the probability distribution of e, that is, the likelihood function is given by8
Comparing (2.56) and (2.58), we see that the latter cannot be written in the form of the former. Durbin and Watson (1950) approached the problem from the opposite direction and selected a form of D in such a way that (2.56) becomes "close" to (2.58). They chose D = In – | A, where
є^є = Хє2 – 2 X(^ – єм)2
t=1 2 t=2
= т^(єі + єП ) + X єієі-і – Then, the density (2.56) reduces to
1 f n
/(є) = const. exp – — f (1 + р2)є’є – р(є2і + єП) – 2pX є, Є,-і
Now the only difference between the likelihood function (2.58) and (2.61) is on the middle terms involving p2 and p, and the difference can be neglected. Anderson’s theorem suggests that a UMP test should be based on
Durbin and Watson (1950) used a slight transformation of "a" to form their test statistic, namely,
Note that "a" in (2.62) is approximately equal to the estimate p = ХП=2 є te t-1/X[t1є?, whereas d — 2(1 – p). Most econometrics textbooks discuss in details about tables of bounds for the DW test, and we will not cover that here. Our main purpose is to trace the origin of the DW test to the N-P lemma. For a historical account of the DW test, see (King, 1987). In spatial econometrics, Moran’s (1950) test for spatial dependence has the similar form
where W is a spatial weight matrix that represents "degree of potential interaction" among neighboring locations. Using the above analysis it is easy to link I to the N-P lemma and demonstrate its optimality [for more on this, see (Anselin and Bera, 1998)].
In the econometrics literature, Rao’s score test is known as the Lagrange multiplier test. This terminology came from Silvey (1959). Note that the restricted MLE of 0 under the restriction H0 : h(0) = c can be obtained from the first order condition of the Lagrangian function
L = l(0) – X'[h(0) – c], (2.65)
where X is an r x 1 vector of Lagrange multipliers. The first order conditions are
s(0) – H(0)X = 0 (2.66)
h(0) = c, (2.67)
where H(0) = . Therefore, we have s(0) = H(0)X. Given that H(0) has full rank,
s(0) = 0 is equivalent to X = 0, that is, the Lagrange multipliers vanish. These multipliers can be interpreted as the implicit cost (shadow prices) of imposing the restrictions. It can be shown that
that is, the multipliers give the rate of change of the maximum attainable value with respect to the change in the constraints. If H0 : h(0) = c is true and 1(0) gives the optimal value, X should be close to zero. Given this "economic" interpretation in terms of multipliers, it is not surprising that econometricians prefer the term LM rather than RS. In terms of Lagrange multipliers, (2.39) can be expressed as
RS = LM = X’H(0)’I (0)-1H(0)X. (2.69)
Byron (1968), probably the first to apply the RS test in econometrics, used the version (2.69) along with the LR statistic for testing homogeneity and symmetry restrictions in demand equations. It took another decade for econometricians to realize the potential of the RS test. In this regard, the work of Breusch and Pagan
(1980) has been the most influential. They collected relevant research reported in the statistics literature, presented the RS test in a general framework in the context of evaluating econometric models, and discussed many applications. Since the late 1970s, econometricians have applied the score principle to a variety of econometric testing problems and studied the properties of the resulting tests. Now the RS tests are the most common items in the econometricians’ kit of testing tools. We will make no attempt to provide a test of all applications of the RS test in econometrics for these are far too many. For example, consider the linear regression model (2.54). The OLS analysis of this model is based on four basic assumptions: correct linear functional form; the assumptions of disturbance normality; homoskedasticity; and serial independence. Violation of these affects both estimation and inference results. With the aid of the RS principle, many procedures have been proposed to test the above assumptions and these are now routinely reported in most of the standard econometric software packages. In most cases, the algebraic forms of the LR and W tests can hardly be simplified beyond their original formulae (2.12) and (2.40). On the other hand, in many cases the RS test statistics, apart from its computational ease, can be reduced to neat and elegant formulae enabling its easy incorporation into computer software. Breusch and Pagan (1980), Godfrey (1988), Bera and Ullah (1991), Davidson and MacKinnon (1993), and Bera and Billias (2000) discussed many applications of the score tests in econometrics and demonstrated that many of the old and new econometric tests could be given a score-test interpretation. For example, test procedures developed in Hausman (1978), Newey (1985), Tauchen (1985), and White (1982) could be put in the framework of the score test. To see this, let us consider the Newey (1985) and Tauchen (1985) moment test and write the moment restriction as
Ef[m(y; 0)] = 0, (2.70)
where Ef means that (2.70) is true only whenf(y; 0) is the correct p. d.f. A test for this hypothesis can be based on the estimate of the sample counterpart of (2.70), namely,
X m( у ; 0).
Now consider an auxiliary p. d.f.
f *(y; 0. Y) = f (y; 0) exp[ym(y; 0) – ф(Є, у)], (2.72)
where ф(0, у) = ln/ехр[уш(y; 0)]f(y; 0)dy. Note that if f(y; 0) is the correct p. d.f., then у = 0 in (2.72). Therefore, a test for the correct specification of f( y; 0) can be achieved by testing у = 0. Writing the loglikelihood function under the alternative hypothesis as
l*(0, y) = Xln f*( y; 0, y).
we see that the score function for testing у = 0 in (2.72) is
and it gives the identical moment test. This interpretation of the moment test as a score test was first noted by White (1984). Recently, Chesher and Smith (1997) gave more general and rigorous treatments of this issue. There are uncountably many choices of the auxiliary p. d.f. f *( y; 0, у), and the score test is invariant with respect to these choices. The LR test, however, will be sensitive to the form of
f*( y; 0 у).
Neyman’s (1959) C(a) formulation, which formally established that "every" locally optimal test should be based on the score function (see equation (2.47)), also has been found to be useful in econometrics. For testing complicated nonlinear restrictions the Wald test has a computational advantage; however, on this particular occasion the Wald test runs into a serious problem of non-invariance, as pointed out by Gregory and Veal (1985) and Vaeth (1985). In this situation the score tests are somewhat difficult to compute, but the C(a) tests are the most convenient to use, as demonstrated by Dagenais and Dufour (1991) and Bera and Billias (2000).
Rayner and Best (1989, sections 4.2 and 6.1) showed that Neyman’s smooth statistic у) in (2.35) can be derived as a score test for testing H0 : 51 = S2 = … = 5r = 0 in (2.33). In fact, Neyman’s smooth test can be viewed as a first formally derived Rao’s score test from the Neyman-Pearson principle. We have seen no formal application of the smooth test in econometrics. However, Lawrence Klein gave a seminar on this topic at MIT during academic year 1942-3 to draw attention to Neyman (1937), since the paper was published in a rather recondite journal [see Klein, 1991]. Unfortunately, Klein’s effort did not bring Neyman’s test to econometrics. However, some of the tests in econometrics can be given a smooth test interpretation. The test for normality suggested in Jarque and Bera (1980) and Bera and Jarque (1981) can be viewed as a smooth test [see Rayner and Best, 1989, p. 90]. Orthogonal polynomial tests suggested by Smith (1989) and Cameron and Trivedi (1990) are also in the spirit of Neyman (1937). In a recent paper
Diebold, Gunther, and Tay (1998) suggested the use of the density of the probability integral transformation (2.31) for density forecast evaluation. They adopted a graphical approach; however, their procedure can be formalized by using a test of the form у2.
In Example 3 we saw that the score function vanished under the null hypothesis. In economics this kind of situation is encountered often [see, for instance, Bera, Ra, and Sarkar, 1998]. Lee and Chesher (1986) have offered a comprehensive treatment of this problem, and one of the examples they considered is the stochastic production frontier model of Aigner, Lovell, and Schmidt (1977). In Lee and Chesher (1986) the score vanished when they tried to test the null hypothesis that all the firms are efficient. They suggested using the second-order derivatives of the loglikelihood function. From (2.25) we get the same test principle by putting s(00) = 0, namely, reject the null if
Therefore, again using the Neyman-Pearson principle, we see that when the score vanishes, we cannot get a locally best test but we can obtain a locally best unbiased test based on the second derivative of the loglikelihood function.
In this chapter we have first explored the general test principles developed by statisticians with some simple examples and, then, briefly discussed how those principles have been used by econometricians to construct tests for econometric models. It seems we now have a large enough arsenal to use for model testing. We should, however, be careful particularly about two aspects of testing: first, interpreting the test results, and second, taking the appropriate action when the null hypothesis is rejected. When asked about his contribution to linear models, Geoff Watson mentioned the DW test but added [see Beran and Fisher, 1998, p. 91], "What do I do if I have a regression and find the errors don’t survive the Durbin-Watson test? What do I actually do? There is no robust method. You’d like to use a procedure that would be robust against errors no matter what the covariance matrix is. Most robustness talk is really about outliers, long-tail robustness. Dependence robustness is largely untouched."9 This problem can arise after applying any test. Use of large sample tests when we have very limited data and the issue of pretesting are other important concerns. As this century and the millennium rush to a close, more research to solve these problems will make econometric model evaluation and testing more relevant in empirical work.
* We would like to thank Badi Baltagi, Roger Koenker and two anonymous referees for many pertinent comments. We are also grateful to Yulia Kotlyarova who provided very competent research assistance during the summer of 1998 and offered many helpful suggestions on an earlier draft of this chapter. We, however, retain the responsibility for any remaining errors. Financial support from the Research Board of the University of Illinois at Urbana-Champaign and the Office of Research, College of Commerce and Business Administration, University of Illinois at Urbana-Champaign are gratefully acknowledged.
1 Neyman (1980, p. 6) stated their intuition as, "The intuitive background of the likelihood ratio test was simply as follows: if among the contemplated admissible hypotheses there are some that ascribe to the facts observed probabilities much larger than that ascribed by the hypothesis tested, then it appears ‘reasonable’ to reject the hypothesis."
2 This result follows from the generalized Cauchy-Schwarz inequality (Rao, 1973, p. 54)
(u’v)2 < (u’Au)(v‘A-1v),
where u and v are column vectors and A is a non-singular matrix. Equality holds when
u = A-1v.
3 The interrelationships among these three tests can be brought home to students through an amusing story. Once around 1946 Ronald Fisher invited Jerzy Neyman, Abraham Wald, and C. R. Rao to his lodge for afternoon tea. During their conversation, Fisher mentioned the problem of deciding whether his dog, who had been going to an "obedience school" for some time, was disciplined enough. Neyman quickly came up with an idea: leave the dog free for some time and then put him on his leash. If there is not much difference in his behavior, the dog can be thought of as having completed the course successfully. Wald, who lost his family in the concentration camps, was adverse to any restrictions and simply suggested leaving the dog free and seeing whether it behaved properly. Rao, who had observed the nuisances of stray dogs in Calcutta streets, did not like the idea of letting the dog roam freely and suggested keeping the dog on a leash at all times and observing how hard it pulls on the leash. If it pulled too much, it needed more training. That night when Rao was back in his Cambridge dormitory after tending Fisher’s mice at the genetics laboratory, he suddenly realized the connection of Neyman and Wald’s recommendations to the Neyman-Pearson LR and Wald tests. He got an idea and the rest is history.
4 In the C(a) test the letter "C" refers to Cramer and "a" to the level of significance. Neyman (1959) was published in a Festschrift for Harald Cramer. Neyman frequently referred to this work as his "last performance." He was, however, disappointed that the paper did not attract as much attention as he had hoped for, and in later years, he regretted publishing it in a Festschrift as not many people read Festschrifts.
5 In the introduction (p. 11), Tinbergen stated,
The purpose of this series of studies is to submit to statistical test some of the theories which have been put forward regarding the character and causes of cyclical fluctuation in business activity. Many of these theories, however, do not exist in a form immediately appropriate for statistical testing while most of them take account of the same body of economic phenomena – viz., the behavior of investment, consumption, incomes, prices, etc. Accordingly, the method of procedure here adopted is not to test the various theories one by one (a course which would involve much repetition), but to examine in succession, in the light of the various explanations which have been offered, the relation between certain groups of economic phenomena.
He, however, cautioned against relying too much on the test results, "for no statistical test can prove a theory to be correct" (p. 12). For more on this see (Duo, 1993, Chapter 5).
6 In his Nobel lecture he stated (Haavelmo, 1997),
“For my own part I was lucky enough to be able to visit the United States in 1939 on a scholarship…. I then had the privilege of studying with the world famous statistician Jerzy Neyman in California for a couple of months. At that time, young and naive, I thought I knew something about econometrics. I exposed some of my thinking on the subject to Professor Neyman. Instead of entering into a discussion with me, he gave me two or three exercises for me to work out. He said he would talk to me when I had done these exercises. When I met him for the second talk, I had lost most of my illusions regarding the understanding of how to do econometrics. But Professor Neyman also gave me hopes that there might be other more fruitful ways to approach the problem of econometric methods than those which had so far caused difficulties and disappointments.”
7 Technically speaking, this model does not fit in our earlier framework due to the dependence structure. However, once a proper likelihood function is defined we can derive our earlier test statistics.
8 Instead of dealing with the joint distribution conditional on the explanatory variables in all time periods, a better approach would be to consider sequential conditional distribution under much weaker assumptions. Wooldridge (1994) discusses the merits of modeling sequential distributions.
9 We should, however, note that econometricians have developed a number of procedures to estimate a consistent variance-covariance matrix to take account of the unknown form of dependence; for a discussion of this and other robust procedures see (Bera, 2000) and (Wooldridge, 2000).
Aigner, D. J., C. A.K. Lovell, and P. Schmidt (1977). Formulation and estimation of stochastic frontier production function model. Journal of Econometrics 6, 21-37.
Anderson, T. W. (1948). On the theory of testing serial correlation. Skandinavisk Aktuarietidskrift 31, 88-116.
Anselin, L., and A. K. Bera (1998). Spatial dependence in linear regression models with an introduction to spatial econometrics. In A. Ullah and D. E.A. Giles (eds.), Handbook of Applied Economic Statistics. New York: Marcel Dekker, 237-89.
Bayes, Rev. T. (1763). An essay toward solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society 53, 370 -418.
Bera, A. K. (2000). Hypothesis testing in the 20th century with a special reference to testing with misspecified models. In C. R. Rao and G. Szekely (eds.), Statistics for the 21st Century. New York: Marcel Dekker, 33-92.
Bera, A. K., and Y. Billias (2000). Rao’s score, Neyman’s C(a) and Silvey’s LM test: An essay on historical developments and some new results. Journal of Statistical Planning and Inference Forthcoming.
Bera, A. K., and C. M. Jarque (1981). An efficient large-sample test for normality of observations and regression residuals. Working Paper in Economics and Econometrics, Number 40, The Australian National University, Canberra.
Bera, A. K., and A. Ullah (1991). Rao’s score test in econometrics. Journal of Quantitative Economics 7, 189-220.
Bera, A. K., and M. J. Yoon (1993). Specification testing with locally misspecified alternatives. Econometric Theory 9, 649-58.
Bera, A. K., S.-S. Ra, and N. Sarkar (1998). Hypothesis testing for some nonregular cases in econometrics. In S. Chakravarty, D. Coondoo, and R. Mukherjee (eds.), Econometrics: Theory and Practice, New Delhi: Allied Publishers, 319-51.
Beran, R. J., and N. I. Fisher (1998). A conversation with Geoff Watson. Statistical Science 13, 75-93.
Breusch, T. S., and A. R. Pagan (1980). The Lagrange multiplier test and its applications to model specification in econometrics. Review of Economic Studies 47, 239-53.
Byron, R. P. (1968). Methods for estimating demand equations using prior information: A series of experiments with Australian data. Australian Economic Papers 7, 227-48.
Cameron, A. C., and P. K. Trivedi (1990). Conditional moment tests and orthogonal polynomials, Working Paper in Economics, Number 90-051, Indiana University.
Chesher, A., and R. Smith (1997). Likelihood ratio specification tests. Econometrica 65, 627-46.
Cramer, H. (1946). Mathematical Methods of Statistics. New Jersey: Princeton University Press.
Dagenais, M. G., and J.-M. Dufour (1991). Invariance, nonlinear models, and asymptotic tests. Econometrica 59, 1601-15.
Davidson, R., and J. G. MacKinnon (1993). Estimation and Inference in Econometrics. Oxford: Oxford University Press.
Diebold, F. X., T. A. Gunther, and A. S. Tay (1998). Evaluating density forecasts with application to financial risk management. International Economic Review, 39, 863-905.
Duo, Q. (1993). The Foundation of Econometrics: A Historical Perspective. Oxford: Clarendon Press.
Durbin, J., and G. S. Watson (1950). Testing for serial correlation in least squares regression I. Biometrika 37, 409-28.
Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transaction of the Royal Society A222, 309-68.
Godfrey, L. G. (1988). Misspecification Tests in Econometrics, The Lagrange Multiplier Principle and Other Approaches. Cambridge: Cambridge University Press.
Gourieroux, C., and A. Monfort (1995). Statistics and Econometric Models 2. Cambridge: Cambridge University Press.
Gregory, A. W., and M. R. Veal (1985). Formulating Wald tests of nonlinear restrictions.
Econometrica 53, 1465-8.
Haavelmo, T. (1944). The probability approach in econometrics. Supplements to Econometrica 12.
Haavelmo, T. (1997). Econometrics and the welfare state: Nobel lecture, December 1989. American Economic Review 87, 13-5.
Hausman, J. J. (1978). Specification tests in econometrics. Econometrica 46, 1215-72.
Jarque, C. M., and A. K. Bera (1980). Efficient tests for normality, homoskedasticity and serial independence of regression residuals. Economics Letters 6, 255-9.
King, M. L. (1987). Testing for autocorrelation in linear regression models: A survey. In M. L. King and D. E.A. Giles (eds.), Specification Analysis in the Linear Model. London: Routledge and Kegan Paul, 19-73.
Klein, L. (1991). The statistics seminar, MIT, 1942-1943. Statistical Science 6, 320-30.
Lee, L. F., and A. Chesher (1986). Specification testing when score test statistics are individually zero. Journal of Econometrics 31, 121-49.
Lehmann, E. L. (1986). Testing Statistical Hypotheses. New York: John Wiley & Sons.
Lehmann, E. L. (1999). Elements of Large Sample Theory. New York: Springer-Verlag.
Moran, P. A.P. (1950). A test for the serial independence of residuals. Biometrika 37, 178-81.
Newey, W. (1985). Maximum likelihood specification testing and conditional moment tests. Econometrica 53, 1047-70.
Neyman, J. (1937). "Smooth test" for goodness of fit. Skandinavisk Akturarietidskrift 20, 150-99.
Neyman, J. (1954). Sur une famille de tests asymptotiques des hypotheses statistiques compasees. Trabajos de Estadistica 5, 161-8.
Neyman, J. (1959). Optimal asymptotic test of composite statistical hypothesis. In U. Grenander (ed.), Probability and Statistics, the Harald Cramer Volume. Uppsala: Almqvist and Wiksell, 213-34.
Neyman, J. (1980). Some memorable incidents in probabilistic/statistical studies. In I. M. Chakravarti (ed.), Asymptotic Theory of Statistical Tests and Estimation, New York: Academic Press, 1-32.
Neyman, J., and E. S. Pearson (1928). On the use and interpretation of certain test criteria for purpose of statistical inference. Biometrika 20, 175-240.
Neyman, J., and E. S. Pearson (1933). On the problem of the most efficient tests of statistical hypothesis. Philosophical Transactions of the Royal Society, Series A 231, 289-337.
Neyman, J., and E. S. Pearson (1936). Contribution to the theory of testing statistical hypothesis I: Unbiased critical regions of type A and type A1. Statistical Research Memoirs 1, 1-37.
Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine Series 5, 50, 157-75.
Rao, C. R. (1948). Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Proceedings of the Cambridge Philosophical Society 44, 50-7.
Rao, C. R. (1973). Linear Statistical Inference and Its Applications. New York: John Wiley and Sons.
Rao, C. R. (2000). Two score and ten years of score tests. Journal of Statistical Planning and Inference.
Rao, C. R., and S. J. Poti (1946). On locally most powerful tests when alternatives are one sided. Sankhya 7, 439-40.
Rayner, J. C.W., and D. J. Best (1989). Smooth Tests of Goodness of Fit. New York: Oxford University Press.
Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. New York: John Wiley and Sons.
Silvey, S. D. (1959). The Lagrange multiplier test. Annals of Mathematical Statistics 30, 389-407.
Smith, R. (1989). On the use of distributional mis-specification checks in limited dependent variable models. Economic Journal 99, 178-92.
Tauchen, G. (1985). Diagnostic testing and evaluation of maximum likelihood models. Journal of Econometrics 30, 415-43.
Vaeth, M. (1985). On the use of Wald’s test in exponential families. International Statistical Review 53, 199-214.
Wald, A. (1943). Tests of statistical hypothesis concerning several parameters when the number of observations is large. Transactions of the American Mathematical Society 54, 426-82.
White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50, 1-25.
White, H. (1984). Comment on "Tests of specification in econometrics." Econometric Reviews 3, 261-7.
Wooldridge, J. M. (1994). Estimation and inference for dependent processes. In R. F. Engle and D. L. McFadden (eds.), Handbook of Econometrics Vol. 4. Amsterdam: North-Holland, 2639-738.
Wooldridge, J. M. (2000). Diagnostic testing. Chapter 9 this volume.