Neyman-Pearson generalized lemma and its applications

The lemma can be stated as follows:

Let g1, g,…, gm, gm+1 be integrable functions and ф be a test function over S such that 0 < ф < 1, and

Подпись: (2.6)фgidy = ci i = 1, 2,…, m,

where c1, c2,…, cm are given constants. Further, let there exist a ф* and constants k1, k2,…, km such that ф* satisfies (2.6), and

Ф* = 1 if gm+1 > X

i=1

m

Подпись: (2.7)= 0 if gm+1 < X kigi

i=1

image040

Ф*gm+1dУ ^

the function ф*(у) defined as

ф*(у) = 1 when = 0 when

Ф*(у)Ц91)йу ^

that is, ф*(у) will provide the MP test. Therefore, in terms of critical region,

image5
then,

where k is such that Pr{o | H0} = a, is the MP critical region.

The N-P lemma also provides the logical basis for the LR test. To see this, consider a general form of null hypothesis, H0 : h (9) = c where h(9) is an r x 1 vector function of 9 with r < p and c a known constant vector. It is assumed that H(9) = has full column rank, that is, rank[H(9)] = r. We denote the maximum likelihood estimator (MLE) of 9 Ьу 0, and Ьу 0, the restricted MLE of 9, that is,
0 is obtained by maximizing the loglikelihood function 1(0) = ln L(0) subject to the restriction h(0) = c. Neyman and Pearson (1928) suggested their LR test as

Подпись: LR = 2Подпись: 2[l(0) - l(0)].image6(2.12)

Their suggestion did not result from any search procedure satisfying an optimality criterion, and it was purely based on intuitive grounds and Fisher’s (1922) likelihood principle.1 Comparing the MP critical region in (2.11) with (2.12)

we can see the logical basis of the LR test.

Locally MP (LMP) and Rao’s (1948) score tests

Let us consider a simple case, say p = 1 and test H0 : 0 = 0O. Assuming that the

power function у (0) in (2.3) admits Taylor series expansion, we have

Y(0) = Y(0o) + (0 – 0o)T'(0o) + Y"(0*), (2.13)

Подпись: Y'(0o) = д0У(0) Подпись: Ф(У) ^o^ Подпись: (2.14)

where 0* is a value in between 0 and 0O. If we consider local alternatives of the form 0 = 0O + 8/ Vn, 0 < 8 < ro, the third term will be of order O(n :). To obtain highest power, we need to maximize,

for 0 > 0O. Therefore, for an LMP test of size a we should have

Ф(У)Ц0о)^У = a

and maximize /ф(у) – J0 L(0o)dy. In the N-P generalized lemma, let us put m = 1, y1 = L(0O), g2 = – J0 L(0O), c1 = a and k1 = k. Then from (2.7) and (2.8), the LMP test will have critical region

d0 L(0o) > kL(0o) or

Подпись:> k. (2.15)

0=00

The quantity s(0) = Э1(0)/Э0 is known as the score function. The above result was first discussed in Rao and Poti (1946), who stated that an LMP test for H0 : 0 = 0O is given by

hs(Qo) > l2,

Подпись: (2.16)where l2 is so determined that the size of test is equal to a preassigned value a with l1 as +1 or -1, respectively, for alternative 0 > 90 and 0 < 0O. Test criterion

(2.16) is a precursor to Rao’s score (RS) or the Lagrange multiplier (LM) test that has been very useful to econometrics for developing various model diagnostic procedures, as we will discuss later.

The LMP test can also be obtained directly from the N-P lemma (2.11). By expanding L(01) around 0O as

Подпись:L(01) = L(0o) + (01 – 0o) d – L(0*),

O0

where 0* is in between 0O and 01. Therefore, according to (2.11) we reject H0 if

L(0*) > k.

image051

(2.18)

 

image052

Now as 01 ^ 0O, it is clear that this critical region reduces to that of (2.15) [see Gourieroux and Monfort, 1995, p. 32].

Example 1. As an example of an LMP test consider testing for the median of a Cauchy distribution with probability density

image053

-ro < y < ro.

 

(2.19)

 

We test H0 : 0 = 0 against H1 : 0 > 0. For simplicity, take n = 1, and therefore, we reject H0 for large values of

image054

2y

1 + y2

 

(2.20)

 

image055

As constructed, this will provide an optimal test for 0 close to zero (local altern­atives). Now suppose 0 >> 0, and we can see that as 0 ^ ro,

Therefore, for distant alternatives the power of the test will be zero.

Therefore, what works for local alternatives may not work at all for not-so – local alternatives. The situation, however, is not so grim universally. Consider the following standard example.

Example 2. Let Y ~ N(p, 1) and test H0 : p = 0 against H1 : p > 0 based on a sample of size 1. We have

Подпись:Эln f( y; p) Эр

Therefore, we reject H0 if y > k, where k = Za, the upper a percent cut-off point of standard normal. The power of this test is 1 – Ф^а – p), where Ф( ) is the distribution function of the standard normal density. And as p ^ ^, the power of the test goes to 1. Therefore, the test y > Za is not only LMP, it is also uniformly most powerful (UMP) for all p > 0.

Now let us consider what happens to the power of this test when p < 0. The power function Pr(y > Za | p < 0) still remains 1 – Ф(Za – p), but it is now less than a, the size of the test. Therefore, the test is not MP for all p Ф 0. To get an MP test for two-sided alternatives, we need to add unbiasedness as an extra condition in our requirements.

Locally most powerful unbiased (LMPU) test

A test ф(у) of size a is unbiased for H0 : 0 G Q0 against H1 : 0 G Q1 if Ее[ф(у)] < a for 0 G Q0 and Е0[ф(у)] > a for 0 G Q1. Suppose we want to find an LMPU test for testing H0 : 0 = 00 against H1 : 0 Ф 00. By expanding the power function у(0) in (2.3) around 0 = 00 for local alternatives, we have

Подпись: (2.22)Y(0) = Y(00) + (0 – 00)Y'(00) + -(^-2^ T"(00) + o(0 = a + -^o) Y"(00) + o(n-1).

Unbiasedness requires that the "power" should be minimum at 0 = 00, and, hence, y(00) = 0. To maximize the local power, we, therefore, need to maximize Y"(00) for both 0 > 00 and 0 < 00, and this leads to the LMPU test. Neyman and Pearson (1936, p. 9) called the corresponding critical region "type-A region," and this requires maximization of Y^(00) subject to two side-conditions y(00) = a and y(00) = 0. In the N-P generalized lemma, let us put m = 2, c1 = 0, c2 = a, & = ‘LP, g2 = L(00) and g3 = 32^г0), then from (2.7) and (2.8), the optimal test

Подпись: Э2Ц0о) Э02 Подпись: ЭЦЭ0) Подпись: k2 L(00) Подпись: (2.23)

function ф* = 1 if

Подпись: d2l(9p) + Э02 Подпись: Э1(90) Э9 Подпись: > k1 dK00) + k2. 1 Э0 2 Подпись: (2.24)

and ф* = 0, otherwise. Critical region (2.23) can be expressed in terms of the derivatives of the loglikelihood function as

In terms of the score function s(0) = Э!(0)/Э0 and its derivative s'(0), (2.24) can be written as

Example 2. (continued) For this example, consider now testing H0 : p = 0 against H1 : p Ф 0. It is easy to see that s(90) = y, s'(90) = -1. Therefore, a uniformly most powerful unbiased test (UMPU) will reject H0 if

y2 + k1 y + k 2 > 0

or

y < k" and y > k",

where k1, k 2, k", and k" are some constants determined from satisfying the size and unbiasedness conditions. After some simplification, the LMPU principle leads to a symmetric critical region of the form y < – Za/2 and y > Za/2.

In many situations, s'(9) can be expressed as a linear function of the score s(9). For those cases, LMPU tests will be based on the score function only, just like the LMP test in (2.15). Also for certain test problems s(90) vanishes, then from (2.25) we see that an LMPU test can be constructed using the second derivative of the loglikelihood function.

Example 3. (Godfrey, 1988, p. 92). Let y{ ~ N(0, (1 + 92z,)), i = 1, 2,…, n, where zis are given positive constants. We are interested in testing H0 : 9 = 0, that is, yi has constant variance. The loglikelihood function and the score function are, respectively, given by

n n

Подпись: (2.26)i(9) = const – – Xln(1 + 02z-) – – X y2/(1 + 02zi)

2 i = 1 2 i=1

image067 Подпись: Zi (1 + 02 Zi) Подпись: zy2 (1 + 02 z )2 Подпись: (2.27)
image7

and

image072 Подпись: 2 X z (y2 - 1) 2 i=1 Подпись: (2.28)

It is clear that s(9) = 0 at H0 : 9 = 0. However,

and from (2.25), the LMPU test could be based on the above quantity. In fact, it can be shown that (Godfrey, 1988, p. 92)

Xi=1Z (y 1) n(0, 1). (2.29)

V2Xn=1 z?

where denotes convergence in distribution.

Neyman’s smooth test

Pearson (1900) suggested his goodness-of-fit test to see whether an assumed probability model adequately described the data at hand. Suppose we divide data into /-classes and the probability of the jth class is 0j, j = 1, 2,…, p, and Xf_!0j = 1. Suppose according to the assumed probability model 0j = 0jO; therefore, our null hypothesis could be stated as H0 : 0j = 0jO, j = 1, 2,…, p. Let Uj denote the observed frequency of the jth class, with Xp_1nj = n. Pearson (1900) suggested the goodness-of-fit statistic

image055

image075

(2.30)

 

where O/ and E/ denote, respectively, the observed and expected frequencies for the jth class.

Neyman’s (1937) criticism to Pearson’s test was that (2.30) does not depend on the order of positive and negative differences (O/ – E;). Neyman (1980) gives an extreme example represented by two cases. In the first, the signs of the consecut­ive differences (Oj – Ej) are not the same, and in the other, there is run of, say, a number of "negative" differences, followed by a sequence of "positive" differ­ences. These two possibilities might lead to similar values of P, but Neyman (1937, 1980) argued that in the second case the goodness-of-fit should be more in doubt, even if the value of P happens to be small.

Suppose we want to test the null hypothesis (H0) that f( y; 0) is the true density function for the random variable Y. The specification of f(y; 0) will be different depending on the problem on hand. Let us denote the alternative hypothesis as H1 : Y ~ g( y). Neyman (1937) transformed any hypothesis-testing problem of this type to testing only one kind of hypothesis. Let z = F( y) denote the distribution function of Y, then the density of the random variable Z is given by

h(z) _ g(y)dy = fgyti – (2.31)

dz f( y; 0)

when H0 : Y ~ f( y; 0), then

h(z) = 1 0 < z < 1. (2.32)

Therefore, testing H0 is equivalent to testing whether Z has uniform distribution in the interval (0, 1), irrespective of the specification of f (y; 0). As for the specific alternative to the uniform distribution, Neyman (1937) suggested a smooth class. By smooth alternatives Neyman meant those densities that have few intersec­tions with the null density function and that are close to the null. He specified the alternative density as

j=1

where C(5) is the constant of integration that depends on the 8,- values, and n,(z) are orthogonal polynomials satisfying

1

n, (z)nk(z)dy = 1 for j = k

0

Подпись: (2.34)= 0 for j Ф k.

Under the hypothesis H0 : 81 = 82 = … = 8r = 0, C(8) = 1 and h(z) in (2.33) reduces to the uniform density (2.32). Using the generalized N-P lemma, Neyman (1937) derived a locally most powerful symmetric unbiased test for H0, and the test statistic is given by

image077

(2.35)

 

The test is symmetric in the sense that the asymptotic power of the test depends only on the "distance" ІГ=1 §2 between the null and alternative hypotheses.

2.2 Tests based on score function and Wald’s test

We have already discussed Rao’s (1948) score principle of testing as an LMP test in (2.15) for the scalar parameter 0(p = 1). For the p > 2 case, there will be scores for each individual parameter, and the problem is to combine them in an "opti­mal" way. Let H0 : 0 = 00, where now 0 = (01, 02,…, 0p)’ and 00 = (010, 020,…, 0p0)’, and the (local) alternative hypothesis be as H1 : 0 = 08, where 08 = (010 + +

82,. .., 0p0 + 8p)’. The proportionate change in the loglikelihood function for mov­ing from 00 to 08 is given by 8’s(00), where 8 = (81, 82,…, 8p)’ and s(00) is the score function evaluated at 0 = 00. Let us define the information matrix as

d2l(0)

Э0Э0′

 

I (0) = – E

 

(2.36)

 

Then, the asymptotic variance of 8’s(00) is 8′ I (00)8; and, if 8′ were known, a test could be based on which under H0 will be asymptotically distributed as x1. To eliminate the 8’s and to obtain a linear function that would yield maximum discrimination, Rao (1948) maximized (2.37) with respect to 8 and obtained2

Подпись: sup 8 [§/s(9q)]2 5’i(9q)8

with optimal value 5 = I (00)-1s(00). In a sense, 5 = I (00)-1s(00) signals the optimal direction of the alternative hypothesis that we should consider. For example, when p = 1, 5 = +1 or -1, as we have seen in (2.16). Asymptotically, under the null, the statistic in (2.38) follows a xp distribution in contrast to (2.37), which follows x1. When the null hypothesis is composite, like H0 : h(0) = c with r < p restrictions, the general form of Rao’s score (RS) statistic is

RS = s(0)’I (0)-1s(0), (2.39)

where 0 is the restricted MLE of 0. Under H0 : RS xp. Therefore, we observe two optimality principles behind the RS test; first, in terms of the LMP test as given in (2.15), and second, in deriving the "optimal" direction for the multi­parameter case.

Rao (1948) suggested the score test as an alternative to the Wald (1943) statistic, which for testing H0 : h(0) = c is given by

W = (h(0) – c)'[H(0)’I(0)-1H(0)]-1(h(0) – c). (2.40)

Rao (1948, p. 53) stated that his test "besides being simpler than Wald’s has some theoretical advantages," such as invariance under transformation of parameters. Rao (2000) recollects the motivation and background behind the development of the score test.

The three statistics LR, W, and RS given, respectively in (2.12), (2.40), and (2.39) are referred to as the "holy trinity." We can look at these statistics in terms of different measures of distance between the null and alternative hypotheses. When the null hypothesis is true, we would expect the restricted and unrestricted MLEs of 0, 0, and 0 to be close, and likewise the loglikelihood functions. Therefore the LR statistic measures the distance through the loglikelihood function and is based on the the difference l(0) – l(R). To see the intuitive basis of the score test, note that s(0) is zero by construction, and we should expect s(0) to be close to zero if H0 is true. And hence the RS test exploits the distance through the score function s(0) and can be viewed as being based on s(0) – s(0). Lastly, the W test considers the distance directly in terms of h(0) and is based on [h(0) – c] – [h(R) – c], where by construction h(R) = c. This reveals a duality between the Wald and score tests. At the unrestricted MLE 0, s(0) = 0, and the Wald test checks whether h(0) is away from c. On the other hand, at the restricted MLE 0, h(R) = c by construction, and the score test verifies whether s(0) is far from a null vector.3

Example 4. Consider a multinomial distribution with p classes and let the probability of an observation belonging to the jth class be 0j, so that Xf=i0; = 1. Denote the frequency of jth class by n with "Ц=п = n. We are interested in testing H0 : 0j = 0j 0, j = 1, 2,…, p, where 0jos are known constants. It can be shown that for this problem the score statistic is given by

S(00)’I(00)-1s(00) = І (П ~ П9/0) , (2.41)

j=1 n9j0

image079 Подпись: X °iln j=1 Подпись: (2.42)

where 90 = (910,…, 9p0)’. Therefore, the RS statistic is the same as Pearson’s P given in (2.30). It is quite a coincidence that Pearson (1900) suggested a score test mostly based on intuitive grounds almost 50 years before Rao (1948). For this problem, the other two test statistics LR and W are given by

image082 Подпись: X (°j - E )2 j=1 °j . Подпись: (2.43)

and

The equivalence of the score and Pearson’s tests and their local optimality has not been fully recognized in the statistics literature. Many researchers considered the LR statistic to be superior to P. Asymptotically, both statistics are locally optimal and equivalent, and, in terms of finite sample performance, P performs better [see for example Rayner and Best, 1989, pp. 26-7].

The three tests LR, W, and RS are based on the (efficient) maximum likelihood estimates. When consistent (rather than efficient) estimators are used there is another attractive way to construct a score-type test, which is due to Neyman (1954, 1959). In the literature this is known as the C(a), or effective score or Neyman-Rao test. To follow Neyman (1959), let us partition 9 as 9 = [9[, 92]’, where 92 is a scalar and test H0 : 92 = 920. Therefore, 91 is the nuisance parameter with dimension (p – 1) x 1. Neyman’s fundamental contribution is the derivation of an asymptotically optimal test using consistent estimators of the nuisance parameters. He achieved this in two steps. First he started with a class of func­tion g( y; 91, 92) satisfying regularity condition of Cramer (1946, p. 500).4

For simplicity let us start with a normed Cramer function, that is, g( y; 91, 92) has zero mean and unit variance. We denote – JU-consistent estimator of 9 under H0 by 9+ = (9+’, 920)’. Neyman asked the question what should be the property of g() such that replacing 9 by 9+ in the test statistic would not make any difference asymptotically, and his Theorem 1 proved that g() must satisfy

Cov[g(y; 91, 920), sjy; 01, 920)] = 0, (2.44)

where s1j = -|p, i. e. the score for jth component of 91, j = 1, 2,…, p – 1. In other words, the function g(y; 9) should be orthogonal to s1 = d00). Starting from a normed Cramer function let us construct

p-1

j(y; 91, 920) = g(y; 91, 920) – X bjS1j(01, 020), (2.45)

j=1

where bj, j = 1, 2, … , p – 1, are the regression coefficients of regressing g(y; 91, 920) on s11, s12, … , s1p-1. Denote by o2(91, 920) the minimum variance of j(y; 91, 920), and define

g*(y; 01/ 02o)

Подпись: j(y; 91л 920) O (91, 920)
Подпись: (2.46)

Note that g*(y; 01, 02o) is also a normed Cramer function, and the covariance between g*(y; 01, 020) and s1/(01, 020) is also zero, j = 1, 2,…, p – 1. Therefore, a class of C(a) test can be based on Zn(0+, 020) = Yf=1 g*(y,; 0+, 020). Condition (2.44)

ensures that Zn(01, 020) – Zn(0+, 020) = op(1). The second step of Neyman was to find the starting function g( y; 0) itself. Theorem 2 of Neyman (1959) states that under the sequence of local alternatives H1n : 02 = 020 + , 0 < 5 < «>, Zn(0+, 020) is

Подпись: and image088 image089 Подпись: (2.47)

asymptotically distributed as normal with mean 5po2 and variance unity, where

The asymptotic power of the test will be purely guided by p, and to maximize the power we should select the function g( y; 0) so that p = 1, that is, the optimal choice should be g(y; 0) = = s2(01, 020) say, score for the testing parameter 02.

Therefore, from (2.45), an asymptotically and locally optimal test should be based on the part of the score for the parameter tested that is orthogonal to the score for the nuisance parameter, namely,

p-1

S2(9+, 920) – X bjS1j(9+, 920). (2.48)

j=1

In (2.48) bj, j = 1, 2,…, p – 1 are now regression coefficients of regressing s2(0+, 020) on s11, s12,…, s1p-1, and we can express (2.48) as

S2(0+) – С1(0+)С1(0+)Ы0+) = s*(0), say, (2.49)

where I, j(0) are the appropriate blocks of the information matrix I(0) corresponding to 01 and 02. s*(0) is called the effective score for 02 and its variance, I*2(0) = I22(0) – I21(0)I-1(0)I12(0) is termed effective information. Note that, since s*(0) is the residual score obtained from running a regression of s2(0) on s1(0), it will be orthogonal to the score for 01. The operational form of Neyman’s C(a) test is

C(a) = s*(0+)’I*2(0+)-1s*(0+). (2.50)

Bera and Billias (2000) derived this test using the Rao (1948) framework [see equations (2.38) and (2.39)]. If we replace the л/й – consistent estimator 0+ by the restricted MLE, then s *(0+) and I*2(0+) reduce to s2(0) and I22(0), respectively, and the C(a) test becomes the standard RS test.

01 . 1

n 02 + (y – 02)[3] [4]

Подпись: f( y; 0v 02) Подпись: — ro < y < ro, Подпись: (2.51)

Example 5. (Neyman, 1959) Let us consider testing H0: 02 = 0 in the following Cauchy density

2y

02 + y2

image094 Подпись: (2.52)
Подпись: 02=0

where 01 > 0 and -«> < 02 < ro. It is easy to see that

2Уі

0+2 + y2 ^

image097
Подпись: (2.53)

I12(0) = 0 and I22(0) = -205 under H0. Therefore, s*(0) = s2(0) and I*2(0) = I22(0). Hence the C(a) statistic (2.50) based on a sample y = (y1, y2,…, yn)’ is given by

For 0+, we can use any 4n – consistent estimator such as the difference between the third and first sample quartiles. Since I21(0) = 0 under H0, the RS test will have the same algebraic form (2.53), but for 01, we need to use the restricted MLE, 01.

Neyman’s C(a) approach provides an attractive way to take into account the nuisance parameter 01. Bera and Yoon (1993) applied this approach to develop tests that are valid under a locally misspecified model. They showed that replac­ing 01, even by a null vector in the final form of the test, would lead to a valid test procedure.

This ends our discussion of the test principles proposed in the statistics litera­ture. We have covered only those tests that have some relevance to testing and evaluating econometric models. In the next section we discuss some of their applications.

John Maynard Keynes was skeptical about applying statistical techniques to economic data, as can be seen in his review of Tinbergen’s book. It was left to Haavelmo (1944) to successfully defend the application of statistical methodolo­gies to economic data within the framework of the joint probability distribution of variables. Trygve Haavelmo was clearly influenced by Jerzy Neyman,6 and Haavelmo (1944) contains a seven page account of the Neyman-Pearson theory. He clearly stated the limitation of the standard hypothesis testing approach and explicitly mentioned that a test is, in general, constructed on the basis of a given fixed set of possible alternatives that he called a priori admissible hypotheses. And whenever this priori admissible set deviates from the data generating pro­cess, the test loses its optimality [for more on this see (Bera and Yoon, 1993) and (Bera, 2000)]. Haavelmo, however, did not himself formally apply the Neyman – Pearson theory to econometric testing problems. That was left to Anderson (1948) and Durbin and Watson (1950).

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>