# Likelihood Ratio, Wald and Lagrange Multiplier Tests

Before we go into the derivations of these three tests we start by giving an intuitive graphical explanation that will hopefully emphasize the differences among these tests. This intuitive explanation is based on the article by Buse (1982).

Consider a quadratic log-likelihood function in a parameter of interest, say ц. Figure 2.6 shows this log-likelihood logL(^), with a maximum at jl. The Likelihood Ratio test, tests the null hypothesis H0; л = л0 by looking at the ratio of the likelihoods A = L(^0)/L(jl) where Figure 2.6 Wald Test

—2logA, twice the difference in log-likelihood, is distributed asymptotically as x1 under H0. This test differentiates between the top of the hill and a preassigned point on the hill by evaluating the height at both points. Therefore, it needs both the restricted and unrestricted maximum of the likelihood. This ratio is dependent on the distance of p0 from Д and the curvature of the log-likelihood, C(p) = d2logL(^)/d^21, at Д. In fact, for a fixed (p — p0), the larger C(p), the larger is the difference between the two heights. Also, for a given curvature at Д, the larger (p — p0) the larger is the difference between the heights. The Wald test works from the top of the hill, i. e., it needs only the unrestricted maximum likelihood. It tries to establish the distance to p0, by looking at the horizontal distance (p — p0), and the curvature at p. In fact the Wald statistic is W = (p — p0)2C(p) and this is asymptotically distributed as x1 under H0. The usual form of W has I(p) = —E[д2logL(p)/дp2] the information matrix evaluated at p, rather than C(p), but the latter is a consistent estimator of I(p). The information matrix will be studied in details in Chapter 7. It will be shown, under fairly general conditions, that p the MLE of P, has var(^) = I-1(p). Hence W = (p — r0)2/var(p) all evaluated at the unrestricted MLE. The Lagrange-Multiplier test (LM), on the other hand, goes to the preassigned point p0, i. e., it only needs the restricted maximum likelihood, and tries to determine how far it is from the top of the hill by considering the slope of the tangent to the likelihood S(p) = dlogL(r)/dr at P0, and the rate at which this slope is changing, i. e., the curvature at p0. As Figure 2.7 shows, for two log-likelihoods with the same S(p0), the one that is closer to the top of the hill is the one with the larger curvature at p0. This suggests the following statistic: LM = S2(p0){C(p0)}-1 where the curvature appears in inverse form. In the Appendix to this chapter, we show that the E[S(p)] = 0 and var[S(r)] = I(r). Hence LM = S2(r0)I-1(r0) = S2(r0)/var[S(r0)] all evaluated at the restricted MLE. Another interpretation of the LM test is that it is a measure of failure of the restricted estimator, in this case r0, to satisfy the first-order conditions of maximization of the unrestricted likelihood. We know that S(Д) = 0. The question is: to what extent does S(p0) differ from zero? S(p) is known in the statistics literature as the score, and the LM test is also referred to as the score test. For a more formal treatment of these tests, let us reconsider example 3 of a random sample x1,…,xn from a N (p, 4) where we are interested in testing H0; p0 = 2 versus H1; p = 2. The likelihood function L(p) as well as LR = —2logA = n(x — 2)2/4 were given in example 3. In and under H0      The LM statistic is based on

Therefore, W = LM = LR for this example with known variance a2 = 4. These tests are all based upon the Ix — 2| > k critical region, where k is determined such that the size of the test is a. In general, these test statistics are not always equal, as is shown in the next example.      Example 4: For a random sample x1,…,xn drawn from a N(s, a2) with unknown a2, test the hypothesis H0; s = 2 versus H1; s = 2. Problem 5, part (c), asks the reader to verify that

One can easily show that LM/n = (W/n)/[1+(W/n)] and LR/n = log[1+(W/n)]. Let y = W/n, then using the inequality y > log(1 + y) > y/(1 + y), one can conclude that W > LR > LM. This inequality was derived by Berndt and Savin (1977), and will be considered again when we study test of hypotheses in the general linear model. Note, however that all three test statistics are based upon Ix — 2| > k and for finite n, the same exact critical value could be obtained from the Normally distributed x. This section introduced the W, LR and LM test statistics, all of which have the same asymptotic distribution. In addition, we showed that using the normal distribution, when a2 is known, W = LR = LM for testing H0;s = 2 versus Hi;s = 2. However, when a2 is unknown, we showed that W > LR > LM for the same hypothesis.   Example 5: For a random sample xi, …,xn drawn from a Bernoulli distribution with parameter в, test the hypothesis H0; в = 90 versus Hi; 9 = 90, where 90 is a known positive fraction. This example is based on Engle (1984). Problem 4, part (i), asks the reader to derive LR, W and LM for H0; 9 = 0.2 versus Hi; 9 = 0.2. The likelihood L(9) and the Score S(9) were derived in section 2.2. One can easily verify that

 and

 d2logL(e) дв2

 n в(1 – в)

 —E

 I (в) The Wald statistic is based on W = (Omle — во)2I(выьЕ) = (x — во)2 • _(i — _)   using the fact that вMLE = x. The LM statistic is based on

Note that the numerator of the W and LM are the same. It is the denominator which is the var(x) = в(1 — в)/п that is different. For Wald, this var(x) is evaluated at вMLE, whereas for LM, this is evaluated at во.

The LR statistic is based on

logL^MLE) = Ya=і xdogx + (n — E7=i xi)log(1 — x)

and

logL(во) = YTi=1 xilogво + (n — Xa=1 xi)log(1 — во) so that

LR = —2logL(во) + 2logL(?MLE) = —2Ei=1 Xг(logво — logx)

+(n — n=i xi)(log(1 — во) — log(1 — x))]

For this example, LR looks different from W and LM. However, a second-order Taylor-Series expansion of LR around во = x yields the same statistic. Also, for n ^x>, plim x = в and if Но is true, then all three statistics are asymptotically equivalent. Note also that all three test statistics are based upon x — во > k and for finite n, the same exact critical value could be obtained from the binomial distribution. See problem 19 for more examples of the conflict in test of hypotheses using the W, LR and LM test statistics.

Bera and Permaratne (2001, p. 58) tell the following amusing story that can bring home the interrelationship among the three tests: “Once around 1946 Ronald Fisher invited Jerzy Neyman, Abraham Wald, and C. R. Rao to his lodge for afternoon tea. During their conversation, Fisher mentioned the problem of deciding whether his dog, who had been going to an “obedience school” for some time, was disciplined enough. Neyman quickly came up with an idea: leave the dog free for some time and then put him on his leash. If there is not much difference in his behavior, the dog can be thought of as having completed the course successfully. Wald, who lost his family in the concentration camps, was adverse to any restrictions and simply suggested leaving the dog free and seeing whether it behaved properly. Rao, who had observed the nuisances of stray dogs in Calcutta streets did not like the idea of letting the dog roam freely and suggested keeping the dog on a leash at all times and observing how hard it pulls on the leash. If it pulled too much, it needed more training. That night when Rao was back in his Cambridge dormitory after tending Fisher’s mice at the genetics laboratory, he suddenly realized the connection of Neyman and Wald’s recommendations to the Neyman-Pearson LR and Wald tests. He got an idea and the rest is history.”