# Hypothesis Testing

The best way to proceed is with an example.

Example 1: The Economics Departments instituted a new program to teach micro-principles. We would like to test the null hypothesis that 80% of economics undergraduate students will pass the micro-principles course versus the alternative hypothesis that only 50% will pass. We draw a random sample of size 20 from the large undergraduate micro-principles class and as a simple rule we accept the null if x, the number of passing students is larger or equal to 13, otherwise the alternative hypothesis will be accepted. Note that the distribution we are drawing from is Bernoulli with the probability of success 0, and we have chosen only two states of the world H0; 0O = 0.80 and H1; 01 = 0.5. This situation is known as testing a simple hypothesis versus another simple hypothesis because the distribution is completely specified under the null or alternative hypothesis. One would expect (E(x) = n0o) 16 students under H0 and (n01) 10 students under H1 to pass the micro-principles exams. It seems then logical to take x > 13 as the cut-off point distinguishing Ho from H1 . No theoretical justification is given at this stage to this arbitrary choice except to say that it is the mid-point of [10,16]. Figure 2.3 shows that one can make two types of errors. The first is rejecting Ho when in fact it is true, this is known as type I error and the probability of committing this error is denoted by a. The second is accepting H0 when it is false. This is known as type II error and the corresponding probability is denoted by в. For this example

a = Pr[rejecting H0/H0 is true] = Рг[ж < 13/в = 0.8]

= b(n = 20; ж = 0; в = 0.8) + .. + b(n = 20; ж = 12; в = 0.8)

= b(n = 20; ж = 20; в = 0.2) + .. + b(n = 20; ж = 8; в = 0.2)

= 0 + .. + 0 + 0.0001 + 0.0005 + 0.0020 + 0.0074 + 0.0222 = 0.0322

where we have used the fact that b(n; ж; в) = b(n; n — ж;1 — в) and b(n; ж; в) = (Щ) вх(1 — в)п-х

denotes the binomial distribution for ж = 0,1,…,n, see problem 4.

True World

 в0 = 0.80 в1 = 0.50 во No error Type II error ві Type I error No Error

Figure 2.3 Type I and II Error

в = Pr[accepting H0/H0 is false] = Pr^ > 13/в = 0.5]

= b(n = 20; ж = 13; в = 0.5) + .. + b(n = 20; ж = 20; в = 0.5)

= 0.0739 + 0.0370 + 0.0148 + 0.0046 + 0.0011 + 0.0002 + 0 + 0 = 0.1316

The rejection region for Щ, ж < 13, is known as the critical region of the test and a = Pr[Falling in the critical region/H0 is true] is also known as the size of the critical region. A good test is one which minimizes both types of errors a and в. For the above example, a is low but в is high with more than a 13% chance of happening. This в can be reduced by changing the critical region from ж < 13 to ж < 14, so that H0 is accepted only if ж > 14. In this case, one can easily verify that

a = Pr^ < 14/в = 0.8]= b(n = 20; ж = 0; в = 0.8) + .. + b(n = 20,ж = 13,в = 0.8)

= 0.0322 + b(n = 20; ж = 13; в = 0.8) = 0.0322 + 0.0545 = 0.0867

and

в = Pr^ > 14/в = 0.5]= b(n = 20; ж = 14; в = 0.5) + .. + b(n = 20; ж = 20; в = 0.5)

= 0.1316 — b(n = 20; ж = 13; в = 0.5) = 0.0577

By becoming more conservative on accepting H0 and more liberal on accepting Hi, one reduces в from 0.1316 to 0.0577 but the price paid is the increase in a from 0.0322 to 0.0867. The only way to reduce both a and в is by increasing n. For a fixed n, there is a tradeoff between a and в as we change the critical region. To understand this clearly, consider the real life situation of trial by jury for which the defendant can be innocent or guilty. The decision of incarceration or release implies two types of errors. One can make a = Pr[incarcerating/innocence] = 0 and в = its maximum, by releasing every defendant. Or one can make в = Pr[release/guilty] = 0 and a = its maximum, by incarcerating every defendant. These are extreme cases but hopefully they demonstrate the trade-off between a and в.

The Neyman-Pearson Theory

The classical theory of hypothesis testing, known as the Neyman-Pearson theory, fixes a = Pr(type I error) < a constant and minimizes в or maximizes (1 — в). The latter is known as the Power of the test under the alternative.

The Neyman-Pearson Lemma: If C is a critical region of size a and k is a constant such that (L0/L1) < k inside C

and

(Lo/Li) > k outside C

then C is a most powerful critical region of size a for testing H0; в = 90, against H1; 9 = 91.

Note that the likelihood has to be completely specified under the null and alternative. Hence, this lemma applies only to testing a simple versus another simple hypothesis. The proof of this lemma is given in Freund (1992). Intuitively, L0 is the likelihood function under the null H0 and L1 is the corresponding likelihood function under H1. Therefore, (L0/L1) should be small for points inside the critical region C and large for points outside the critical region C. The proof of the theorem shows that any other critical region, say D, of size a cannot have a smaller probability of type II error than C. Therefore, C is the best or most powerful critical region of size a. Its power (1 — в) is maximum at H1. Let us demonstrate this lemma with an example.

Example 2: Given a random sample of size n from N(ц, a2 = 4), use the Neyman-Pearson lemma to find the most powerful critical region of size a = 0.05 for testing H0; ц0 = 2 against the alternative H1; ц1 = 4.

Note that this is a simple versus simple hypothesis as required by the lemma, since a2 = 4 is known and ц is specified by H0 and H1. The likelihood function for the N(ц, 4) density is given by

L(p) = f (x1,…,xn; Ц, 4) = (1/2V2n)n exp {—E7=1(xi — ц)2/8} so that

L0 = L(^) = (1/2V2n)nexp {— £n=1(x* — 2)2/8}

and

L1 = L(^) = (1/2V2n)nexp {— ЕП=1(х* — 4)2/8}

Therefore

L0/L1 = exp {- [ЕІ1 (x* — 2)2 — £n=1(x< — 4)2] /8} = exp {— ЕІ1 x*/2 + 3n/2}

and the critical region is defined by

exp {—^а=1 x*/2 + 3n/2}<k inside C

Taking logarithms of both sides, subtracting (3/2)n and dividing by (—1/2)n one gets

x > K inside C

In practice, one need not keep track of K as long as one keeps track of the direction of the inequality. K can be determined by making the size of C = a = 0.05. In this case

a = Pr[x > K/^ = 2] = Pr[z > (K — 2)/(2//n)]

where z = (x — 2)/(2/^/n) is distributed N(0,1) under H0. From the N(0,1) tables, we have K — 2 (2//n)

Hence,

K = 2 + 1.645(2Д/п)

and x > 2 + 1.645(2/д/п) defines the most powerful critical region of size a = 0.05 for testing Ho; p.0 = 2 versus Hi; p,1 = 4. Note that, in this case

в = Pr[X < 2 + 1.645(2//и)/р = 4]

= Pr[z < [—2 + 1.645(2//n)]/(2//n)] = Pr[z < 1.645 — /n] For n = 4; в = Pr[z < —0.355] = 0.3613 shown by the shaded region in Figure 2.4. For n = 9; в = Pr[z < —1.355] = 0.0877, and for n = 16; в = Pr[z < —2.355] = 0.00925.

This gives us an idea of how, for a fixed a = 0.05, the minimum в decreases with larger sample size n. As n increases from 4 to 9 to 16, the var(x) = a2/n decreases and the two distributions shown in Figure 2.4 shrink in dispersion still centered around /л0 = 2 and ^1 = 4, respectively. This allows better decision making (based on larger sample size) as reflected by the critical region shrinking from x > 3.65 for n = 4 to x > 2.8225 for n = 16, and the power (1 — в) rising from 0.6387 to 0.9908, respectively, for a fixed a < 0.05. The power function is the probability of rejecting H0. It is equal to a under H0 and 1 — в under H1. The ideal power function is zero at H0 and one at H1. The Neyman-Pearson lemma allows us to fix a, say at 0.05, and find the test with the best power at H1.

In example 2, both the null and alternative hypotheses are simple. In real life, one is more likely to be faced with testing H0; ц = 2 versus H1; ц = 2. Under the alternative hypothesis, the distribution is not completely specified, since the mean ц is not known, and this is referred to as a composite hypothesis. In this case, one cannot compute the probability of type II error
since the distribution is not known under the alternative. Also, the Neyman-Pearson lemma cannot be applied. However, a simple generalization allows us to compute a Likelihood Ratio test which has satisfactory properties but is no longer uniformly most powerful of size a. In this case, one replaces L1, which is not known since H1 is a composite hypothesis, by the maximum value of the likelihood, i. e.,

maxLo

maxL

Since max Lo is the maximum value of the likelihood under the null while maxL is the maximum value of the likelihood over the whole parameter space, it follows that maxL0 < maxL and A < 1. Hence, if H0 is true, A is close to 1, otherwise it is smaller than 1. Therefore, A < k defines the critical region for the Likelihood Ratio test, and k is determined such that the size of this test is a.

Example 3: For a random sample x1,…,xn drawn from a Normal distribution with mean j and variance a2 = 4, derive the Likelihood Ratio test for H0; j = 2 versus Hi; j = 2. In this

case

maxLo = (1/2/2n)nexp {-£n=1(xj – 2)2/8} = Lo

and

maxL = (1/2/2n)nexp {- £n=i(xi – x)2/8} = L(jMLE)

where use is made of the fact that jMLE = x. Therefore,

A = exp { [-£n=l(xl – 2)2 + £n=i(xi – x)2] /8} = exp {-n(x – 2)2/8}

Hence, the region for which A < k, is equivalent after some simple algebra to the following region

(x – 2)2 > K or x – 2> K1/2 where K is determined such that Pr[x – 2 > K1/2/j = 2] = a

We know that x ~ N(2, 4/n) under H0. Hence, z = (x – 2)/(2/^/n) is N(0,1) under H0, and the critical region of size a will be based upon z> za/2 where za/2 is given in Figure 2.5 and is the value of a N(0,1) random variable such that the probability of exceeding it is a/2. For a = 0.05, za/2 = 1.96, and for a = 0.10, za/2 = 1.645. This is a two-tailed test with rejection of H0 obtained in case z < – za/2 or z > za/2.

Note that in this case

LR = -2logA = (x – 2)2/(4/n) = z2

which is distributed as xi under H0. This is because it is the square of a N(0,1) random variable under H0. This is a finite sample result holding for any n. In general, other examples may lead to more complicated A statistics for which it is difficult to find the corresponding distributions and hence the corresponding critical values. For these cases, we have an asymptotic result Figure 2.5 Critical Values

which states that, for large n, LR = — 2logA will be asymptotically distributed as xV where v denotes the number of restrictions that are tested by H0. For example 2, v = 1 and hence, LR is asymptotically distributed as Xi – Note that we did not need this result as we found LR is exactly distributed as xi for any n. If one is testing H0; л = 2 and a2 = 4 against the alternative that Hi; Ц = 2 or a2 = 4, then the corresponding LR will be asymptotically distributed as x2, see problem 5, part (f).