Asymptotic Tests and Related Topics
4.5.1 Likelihood Ratio and Related Tests
Let Ux, 0) be the joint density of a Гvector of random variables x = (Xi, x2,. . . , xTY characterized by a ЛГvector of parameters 6. We assume all the conditions used to prove the asymptotic normality (4.2.23) of the maximum likelihood estimator 6. In this section we shall discuss the asymptotic tests of the hypothesis
h(0) = O, (4.5.1)
where h is a ^vector valued differentiable function with q<K. We assume that (4.5.1) can be equivalently written as
В = r(a),
where a is a pvector of parameters such that p = K— q. We denote the constrained maximum likelihood estimator subject to (4.5.1) or (4.5.2) as в = Ha).
Three asymptotic tests of (4.5.1) are well known; they are the likelihood ratio test (LRT), Wald’s test (Wald, 1943), and Rao’s score test (Rao, 1947). The definitions of their respective test statistics are
Maximization of log L subject to the constraint (4.5.1) is accomplished by setting the derivative of log L — A’h(0) with respect to в and A toj), where A is the vector of Lagrange multipliers. Let the solutions be в and A. Then they satisfy
Inserting this equation into the righthand side of (4.5.5) yields Rao = — A’BA where
Silvey (1959) showed that В is the asymptotic variancecovariance matrix of A and hence called Rao’s test the Lagrange multiplier test. For a more thorough discussion of the three tests, see Engle (1984).
All three test statistics can be shown to have the same limit distribution, X2(q), under the null hypothesis. In Wald and Rao, a2 log L/двдв’ can be replaced with T plim T~1d1 log L/двдв’ without affecting the limit distribution. In each test the hypothesis (4.5.1) is to be rejected when the value of the test statistic is large.
We shall prove LRT —» x2(q). By a Taylor expansion we have




























we obtain
LRT = €'(l ~ jybSjR’jy*)*. (4.5.18)
But, because
3a = R’JeR,
I — 3y2R3~lR’3 У2 can be easily shown to be an idempotent matrix of rank q. Therefore, by Theorem 2 of Appendix 2, LRT —* хЧя)
The proof of Wald —*хЧя) and Rao —► хЧя) are omitted; the former is very easy and the latter is as involved as the preceding proof.
Next we shall find explicit formulae for the three tests (4.5.3), (4.5.4), and
(4.5.5) for the nonlinear regression modeU4.3.1) when the error u is normal. Let P be the NLLS estimator of Д, and let P be the constrained NLLS, that is, the value of P that minimizes (4.3.5) subject to the constraint h(p) = 0. Also, define 6 = (df/dp’)j and G = (dt/dP’)j. Then the three test statistics are defined as
LRT = T[log T’lST(P) – log T‘SAP)],
Wald 
and
T[y{(p)]’G{G’GrlG'[ym]
Rao ———————— =—————– .
ST(fi)
Because (4.5.20), (4.5.21), and (4.5.22) are special cases of(4.5.3), (4.5.4), and
(4.5.5) , all three statistics are asymptotically distributed asx2(q) under the null hypothesis if u is normal.10 Furthermore, we can show that statistics (4.5.20),
(4.5.21) , and (4.5.22) are asymptotically distributed as хя) under the null even if u is not normal. Thus these statistics can be used to test a nonlinear hypothesis under a nonnormal situation.
In the linear model with linear hypothesis Q’P = 0, statistics (4.5.20)
(4.5.22) are further reduced to
LRT = Г log [ST{P)/ST(P)], 
(4.5.23) 
Wald = T[ST(p) – ST(p)]/ST(p), 
(4.5.24) 
Rao = T[ST(P) – ST(P)]/ST(p). 
(4.5.25) 
Thus we can easily show Wald S LRT ё Rao. The inequalities hold also in the multiequation linear model, as shown by Bemdt and Savin (1977). Although the inequalities do not always hold for the nonlinear model, Mizon
(1977) found Wald S LRT most of the time in his samples.
Gallant and Holly (1980) obtained the asymptotic distribution of the three statistics under local alternative hypotheses in a nonlinear simultaneous equations model. Translated into the nonlinear regression model, their results can be stated as follows: If there exists a sequence of true values {fil) such that lim PI = fi0 and 6 = lim TuPl~ plim /?) is finite, statistics (4.5.20),
(4.5.21) , and (4.5.22) converge to chisquare with q degrees of freedom and noncentrality parameter A, where
(4.5.26)
Note that if is distributed as a ^vector 2V(0, V), then ({ 4 /r)’V_1(£ + pi) is distributed as chisquare with q degrees of freedom and noncentrality parameter In other words, the asymptotic local power of the tests
based on the three statistics is the same.
There appear to be only a few studies of the small sample properties of the three tests, some of which are quoted in Breusch and Pagan (1980). No clearcut ranking of the tests emerged from these studies.
A generalization of the Wald statistic can be used to test the hypothesis
(4.5.1) , even in a situation where the likelihood function is unspecified, as long as an asymptotically normal estimator fl of fi is available. Suppose fi is asymptotically distributed as N(fi, V) under the null hypothesis, with V estimated consistently by V. Then the generalized Wald statistic is defined by
(4.5.27)
and is asymptotically distributed as* 2(^) under the null hypothesis. Note that
(4.5.21) is a special case of (4.5.27).
Another related asymptotic test is the specification test of Hausman (1978). It can be used to test a more general hypothesis than (4.5.1). The only requirement of the test is that we have an estimator, usually a maximum likelihood estimator, that is asymptotically efficient under the null hypothesis but loses consistency under an alternative hypothesis and another estimator that is asymptotically less efficient than the first under the null hypothesis but remains consistent under an alternative hypothesis. If we denote the first
a <s<
estimator by 0 and the second by 0, the Hausman test statistic is defined by {6 — Q)’~6 — в), where V is a consistent estimator of the asymptotic variancecovariance matrix of (0—0). Under the null hypothesis it is asymptotically distributed as chisquare with degrees of freedom equal to the dimension of the vector 0.
If we denote the^ asymptotic variancecovariance matrix by V, it is well known that V(0 —0) = V(0) — V(0). This equality follows from V(0) = V12, where Vj2 is the asymptotic covariance between 0 and 0. To verify this equality, note that if it did not hold, we could define a new estimator 0 + [V(0) — V12][V(0— 0)]“‘£0 — 0), the asymptotic variancecovariance matrix of which is V(0) – [V(0) – V12][V(0 – 0)]"‘[V(0) – V12]’, which is smaller (in the matrix sense) than V( 0). But this is a contradiction because 0 is asymptotically efficient by assumption.
Leave a reply