# Variance-Covariance Matrix Assumed Known

Consider the case of К = 2. We can write 0 = (0b 02)’ and 0O = (0ю, 02o)’ • It is intuitively reasonable that an optimal critical region should be outside some enclosure containing 0O, as depicted in Figure 9.9. What should be the specific shape of the enclosure?

An obvious first choice would be a circle with 0O at its center. That would amount to the test:

Reject HQ if (01 — 0ioC + (02 — 02oT > c

for some c, where c is chosen so as to make the probability of Type I error equal to a given value a. An undesirable feature of this choice can be demonstrated as follows: Suppose F0i is much larger than V02. Then a large value of |02 — 02O| should be more cause for rejecting H0 than an equally large value of |0i — 01O|, for the latter could be a result of the large variability of 0] rather than the falseness of the null hypothesis.

This weakness is alleviated by the following strategy:

(9.7.2) Reject H0 if

(01 – 01O Г (01 – 020Ґ [8]

where of = VQi and cr2 = R02. Geometrically, the inequality in (9.7.2) represents the region outside an ellipse with 0O at its center, elongated horizontally. We should not be completely satisfied by this solution either, because the fact that this critical region does not depend on the covariance, ct12 = Cov(01; 02), suggests its deficiency.

9

We shall now proceed on the intuitively reasonable premise that if aj = erf and = 0, the optimal test should be defined by (9.7.1). Suppose that X is a positive definite matrix, not necessarily diagonal nor identity. Then by Theorem 11.5.1 we can find a matrix A such that A2A’ = I. By this transformation the original testing problem can be paraphrased as testing H0: A0 = A0O against A0 Ф A0O using A0 ~ N (A0O, I) as the test statistic. Thus, by our premise, we should

(9.7.3) Reject H0 if (AS — A0o)'(A0 — A0O) > c.

But A2A’ = I implies 2 = A *(A’) l, which implies 2 1 = A’A. Therefore, using

(AS – A0o)'(A0 – A0O) = (0 – 0o)’A’A(0 – 0O)

= (0 – 0O)’2_1(0 – 0O),

(9.7.3) can be written as

(9.7.4) Reject HQ if (0 – 0o)’2-1(0 – 0O) > c.

In the two-dimensional case, where v _ C? 042

2i — 2 ’

cri2 cr2

(9.7.4) becomes

(9.7.5) Reject H0 if

o|(0i – 01O)2 + CTj(02 — 02O)2 – 2(Ti2(0i — 0io)(02 _ 02o)

But the maximand in the denominator clearly attains its unique maximum at 0 = 0.

Another attractive feature of the test (9.7.4) is the fact that (9.7.6) (0 – Єо)’2_1(0 – 0O) ~ x!

under the null hypothesis, so that c can be computed to conform to a specified value of a. This result is a consequence of the following important theorem.

THEOREM 9.7.1 Suppose x is an и-vector distributed as A(p, A), where A is a positive definite matrix. Then (x — p,)’ A 1 (x — p.) ~ Xn ■

Proof. Let H be the orthogonal matrix which diagonalizes A, that is, HAH = A,

where A is the diagonal matrix of the characteristic roots of A (see Theorem 11.5.1). Following (11.5.4), define

A_1/2 = HA_1/2H’

where A-1/2 is the diagonal matrix obtained by taking the ( — y2)th power of each diagonal element of A. Then, we can easily show that

A_1/2A A-1/2 = I and A~1/2A-1/2 = A-1.

Therefore, we obtain A 1/2(x — p,) ~ N(0,1). By Definition 1 of the Appendix, (x — p)’A 1/2A 1/2 (x — p.) ~ Xn- LI

As an illustration of the above, consider a three-sided die (assume that such a die exists) which yields numbers 1, 2, and 3 with respective probabilities p-у, p2, and jb3. We are to test the hypothesis that the die is not loaded versus the hypothesis that it is loaded on the basis of n independent rolls. That is,

(9.7.7) Test H0: p= рч= ръ = versus Н. not H0.

О

/ A |

figure 9.10 Critical region for testing the mean of a three-sided die

If we should be constrained to use any of the univariate testing methods expounded in the preceding sections, we would somehow have to reduce the problem to one with a single parameter, but that would not be entirely satisfactory, as we shall show below. Suppose, for example, we decide to test the hypothesis that the expected value of the outcome of the roll is consistent with that of an unloaded die; namely,

(9.7.8) Test H0: pi + 2p% + 3ръ = 2 versus Tlj: pi + 2p2 + 3jb3 Ф 2.

Since p$ = 1 pi p2, the null hypothesis can be stated as 1 — 2p — p% = 0. If we define pi and jb2 as the relative frequencies of 1 and 2 in и rolls, a reasonable test would be to

(9.7.9) Reject Hq if 11 — 2pi — p^> c

for some c, which can be approximately determined from the standard normal table because of the asymptotic normality of pi and р%. In Figure 9.10 the critical region of the test (9.7.9) is outside the parallel dashed lines and inside the triangle that defines the total feasible region. A weakness of the test (9.7.9) as a solution of the original testing problem

(9.7.7) is obvious: an outcome such as pi = 0 and p2 = 1, which is extremely unlikely under the original null hypothesis, will lead to an acceptance by this test.

Now we apply the test (9.7.5) to the original problem (9.7.7). We have, under the null hypothesis,

(9.7.10)

Therefore (9.7.5) becomes (9.7.11) Reject Hq

The left-hand side of the above inequality is asymptotically distributed as X2 under the null hypothesis.

Since (9.7.10) holds only asymptotically, the test (9.7.11) is not identical with the likelihood ratio test. In such a case, (9.7.11) is called the generalized Wald test.

Next we derive the likelihood ratio test and compare it with the generalized Wald test. By Definition 9.4.4 the likelihood ratio test of the problem (9.7.7) is

(9.7.12) A =———————- < d,

3 |

717 Тії 7 Tin 7 Tie

p 1p2 Ps 3

where Uj is the number of times j appears in n rolls. In order to make use of Theorem 9.4.1 we transform the above inequality to

(9.7.13) — 2 log A = 2 (nlog3 + щ log pi + n2logp2 + n3logp3)

> —2 log d.

Noting that pj = nj/n and defining c = — 2 log d, we can write (9.7.13) equivalendy as

(9.7.14) 2n(log 3 + px log pi + pi log pi + p3 log ps) > c.

figure 9.11 The 5% acceptance regions of the generalized Wald test and the likelihood ratio test

To show the approximate equality of the left-hand side of (9.7.11) and

(9.7.14)

, use the Taylor expansion

and apply it to the three similar terms within the parentheses of the left-hand side of (9.7.14). Figure 9.11 describes the acceptance region of

о

the two tests for the case of n = 50 and c = 6. Note that P{2 > 6) = 0.05.

9.7.1 Variance-Covariance Matrix Known up to a Scalar Multiple

There is no optimal solution to our problem if X is completely unknown. There is, however, sometimes a good solution if X = o2Q, where Q is a

о

known positive definite matrix and a is an unknown scalar parameter. In this case it seems intuitively reasonable to reject H0 if

(9.7.15) – —————– > c,

л 2 2

where a is some reasonable estimator of ct. For what kind of estimator

can we compute the distribution of the statistic above, so that we can

determine c so as to conform to a given size a?

One solution is presented below. We first note (9.7.6). If we are given a statistic W such that

W 9

(9.7.16) — ~ xm>

<7*

which is independent of Xk in (9.7.6), we obtain by Definition 3 of the Appendix

Therefore, defining <f2 = W/M will enable us to determine c appropriately.

Assuming the availability of such W may seem arbitrary, so we shall give a couple of examples.

example 9.7.1 Suppose X ~ A(|xx, a2) and Y ~ N(|Xy, a2) are independent of each other. We are to test

on the basis of nx and nY independent observations on X and Y, respectively. We assume that the common variance ct2 is unknown. Suppose that X = nxi’^i=Xi and Y = nY. We have, from Definition 1 and Theo

rem 1 of the Appendix,

and because of Theorems 1 and 3 of the Appendix,

X № – X)2 + X (У і – Y)2

Therefore, by Definition 3 of the Appendix,

[nx(X – Pxo)2 + ny(Y – py0)2]/2 ^——————— й nx+nY~ 2).

X № – Xf+Y, (Yi ~ Yf /(nx+ny-2)

2=1 2=1

We should reject HQ if the above statistic is larger than a certain value, which we can determine from (9.7.20) to conform to a preassigned size of the test.

EXAMPLE 9.7.2 Suppose that X ~ N(|xx, cr2), Y ~ N(ixy, cr2), and Z ~ N( pz, ct2) are mutually independent. We are to test H0: ixx = iy = |xz versus H. not H0 on the basis of nx, nY, and nz independent observations on X, Y, and Z, respectively. Let X, Y, and Z be the sample averages based on nx, nY, and nz observations, respectively. Similarly, let Sx, SY, and Sz be the sample variances based on nx, nY, and nz observations, respectively. Define \ = р. х (Ху, = №x №z> ^-i = X Y, and Xg ~ X Z. Xhen

we have

where

+ -1- nx nY nx

-1 ^ +

nx nx nz

because of Theorem 9.7.1, we have under H0,

But, by Theorems 1 and 3 of the Appendix,

Since the chi-square variables in (9.7.22) and (9.7.23) are independent, we have

EXERCISES

1. (Section 9.2)

Given the density f(x) = 1/0, 0 < x < 0, and 0 elsewhere, we are to test the hypothesis H0: 0 = 2 against Hx: 0 = 3 by means of a single observed value of X. Find a critical region of a = 0.5 which minimizes P and compute the value of (3. Is the region unique? If not, define the class of such regions.

2. (Section 9.2)

Suppose that X has the following probability distribution:

X = 1 with probability 0

1 – 30

where 0 < 0 < 1/3. We are to test H0: 0 = 0.2 against Hx: 0 = 0.25 on the basis of one observation on X.

(a) List all the nonrandomized admissible tests.

(b) Find the most powerful nonrandomized test of size 0.4.

(c)

Find the most powerful randomized test of size 0.3. [9] [10]

0 1

e 0

where e is Euler’s e (= 2.71 . . . ). Assuming the prior probabilities P(H0) = P(Hi) = 0.5, derive the Bayesian optimal critical region.

Calculate the probabilities of Type I and Type II errors for this critical region.

5. (Section 9.3)

Let f(x) =0 exp( — 0x), x > 0, 0 > 0. We want to test H0: 0 = 1 against H. 0 = 2 on the basis of one observation on X. Derive:

(a) the Neyman-Pearson optimal critical region, assuming a = 0.05;

(b) the Bayesian optimal critical region, assuming that P(Hq) = P{H{) and that the loss of Type I error is 2 and the loss of Type II error is 5.

6. (Section 9.3)

Supposing/(x) = (1 + 0)xe, 0 < x < 1, 0 > 0, we are to test H0: 0 = 0O against IP’. 0 = 0i < 0O. Find the Neyman-Pearson test based on a sample of size n. Indicate how to determine the critical region if the size of the test is a.

7. (Section 9.3)

Let X be the outcome of tossing a three-sided die with the numbers 1, 2, and 3 occurring with probabilities ръ p2, and p3. Suppose that 100 independent tosses yielded Ni ones, N2 twos, and N3 threes. Obtain a Neyman-Pearson test of H0: p = p% = % against Pl. p = У2 and p2 = y5. Choose a = 0.05. You may use the normal approximation.

8. (Section 9.3)

We wish to test the null hypothesis that a die is fair against the alternative hypothesis that each of numbers 1, 2, and 3 occurs with probability Ую, 4 and 5 each occurs with probability У5, and 6 occurs with probability 3/10-

(a) If number j appears Nj times, j = 1, 2, . . . , 6, in N throws of the die, define the Neyman-Pearson test.

(b) If N = 2, obtain the most powerful test of size У4 and compute its p value.

(c) If Ni = 16, N2 = 13, N3 = 14, N4 = 22, N5 = 17, and N& = 18, should you reject the null hypothesis at the 5% significance level? What about at 10%? You may use the normal approximation.

9. (Section 9.4)

Given the density f(x) = 1/0, 0 < x < 0, and 0 elsewhere, we are to

test the hypothesis H0: 0 = 2 against ifp 0 > 2 by means of a single observed value of X. Consider the test which rejects H0 if X > c. Determine c so that a = У4 and draw the graph of its power function.

10. (Section 9.4)

Let X be the number of trials needed before a success (with probability^) occurs. That is, P(X = k) = p( 1 — p)k l, h = 1, 2, … . Find the power function for testing H0: p = Vi if the critical region consists of the numbers k = 1, 2, 3. Compare it with the power function of the critical region consisting of the numbers {1, 2, 8, 9, . . .}.

11. (Section 9.4)

Random variables X and Y have a joint density

f(x, y) = Q 2, 0 < X < 0, 0 < у < 0, 0.1 < 0 < 1.

Find the uniformly most powerful test of the hypothesis 0 = 1 of size a = 0.01 based on a single observation of X and Y. Derive its power function.

12. (Section 9.4)

Suppose that a bivariate random variable (X, Y) is uniformly distributed over the square defined by 0 < x, у < 1, where we assume 0 < 0 < 1. We are to test H0: 0 = 0.5 against flj: 0 Ф 0.5 on the basis of a single observation on (X, Y) with a = 0.25.

(a) Derive the likelihood ratio test. If you cannot, define the best test you can think of and justify it from either intuitive or logical consideration.

(b) Obtain the power function of the likelihood ratio test (or your alternative test) and sketch its graph.

(c) Prove that the likelihood ratio test of the problem is the uniformly most powerful test of size 0.25.

13. (Section 9.4)

Suppose (X, Y) have density f(x, y) = 1/(pA), 0 ^ x < p, 0 0 < p < oo, and 0 < X < o°. We are to test H0: p, = X = 1 versus Hi: not H0 on the basis of one observation on (X, Y).

(a) Find the likelihood ratio test of size 0 < a < 1.

(b) Show that it is not the uniformly most powerful test of size a.

14. (Section 9.4)

The density of X is given by

f(x) = 0(x — 0.5) + 1 for —2 5 0^2 and 0 ^ x ^ 1.

Obtain the likelihood ratio test of H0: 0 = 2 against Нг: 0 < 2 on the basis of one observation of X at a = 0.05. Show that this test is the uniformly most powerful test of size 0.05.

15. (Section 9.4)

The joint density of X and Y is given by

f(x, y) = 20-2 for x + у £ 0, 0 < x, 0 < у,

= 0 otherwise.

We test H0: 0 = 0.5 against H^: 0 Ф 0.5, where we assume 0 < 0 ^ 1, on the basis of one observation on (X, Y).

(a) Derive the likelihood ratio test of size 0.25.

(b) Derive its power function and draw its graph.

(c) Show that it is the uniformly most powerful test of size 0.25.

16. (Section 9.5)

Let X be uniformly distributed over [0, 0]. Assuming that the prior density of 0 is uniform over [1, 2], find the Bayes test of H0: 0 Є [1, 1.5] versus Hi. 0 Є (1.5, 2] on the basis of one observation on X. Assume that the loss matrix is given by

17. (Section 9.5)

Random variables X and Y have a joint density

f(x, у I 0) = 0 2 for 0 < x < 0, 0 < у < 0, 0.1 < 0 < 1.

Find the Bayesian test of H0: 0 > l/2 against 0 < */2 based on a single observation of each of X and Y, assuming the prior density /(0) = 1/0.9 for 0.1 ^ 0 ^ 1. Assume that the loss matrix is the same as in Exercise 16.

18. (Section 9.5)

Suppose that the density of X given 0 is/(x | 0) = 2x/0 , 0 ^ x < 0, and the prior density of 0 is /(0) = 20, 0 < 0 < 1. Suppose that we are given a single observation x of X.

(a) Derive the Bayes estimate of 0.

(b) Assuming that the costs of the Type I and II errors are the same, show how a Bayesian tests H0: 0 ^ 0.5 against Hf 0 > 0.5.

19. (Section 9.5)

Let p be the probability that a patient having a particular disease is cured by a new drug. Suppose that the net social utility from a commercial production of the drug is given by

U(p) = —0.5 for 0 < /> < 0.5,

= 2 (p – 0.5) for 0.5 < p < 1.

Suppose that a prior density of p is uniform over the interval [0, 1] and that x patients out of n randomly chosen homogeneous patients have been observed to be cured by the drug. Formulate a Bayesian decision rule regarding whether or not the drug should be approved. If n = 2, how large should x be for the drug to be approved?

20. (Section 9.6)

One hundred randomly selected people are polled on their preference between George Bush and Bill Clinton. How large a percentage point difference must be observed for you to be able to conclude that Clinton is ahead of Bush at the significance level of 5%?

21. (Section 9.6)

Thirty races are run, in which one runner is given a stimulant and another is not. If twenty races are won by the stimulated runner, should you decide that the stimulant has an effect at the 1% significance level? What about at 5%?

22. (Section 9.6)

Suppose you roll a die 100 times and the average number showing on the face turns out to be 4. Is it reasonable to conclude that the die is loaded? Why?

23. (Section 9.6)

We throw a die 20 times, 1 comes up four times and 2 comes up seven times. Let pi be the probability that 1 comes up and jb2 be the probability that 2 comes up. On the basis of our experiment, test the hypothesis pi = p2 = Уб against the negation of that hypothesis. Should we reject the hypothesis at 5%? What about at 10%?

24. (Section 9.6)

It is claimed that a new diet will reduce a person’s weight by an average of 10 pounds in two weeks. The weights of seven women who followed the diet, recorded before and after the two-week period of dieting, are given in the accompanying table. Would you accept the claim made for the diet?

Participant |
Weight before (lbs) |
Weight after (lbs) |

A |
128 |
126 |

В |
130 |
125 |

C |
135 |
129 |

D |
142 |
131 |

E |
137 |
125 |

F |
148 |
138 |

G |
154 |
130 |

25. (Section 9.6)

The price of a certain food item was sampled in various stores in two cities, and the results were as given below. Test the hypothesis that there is no difference between the mean prices of the particular food item in the two cities using the 5% and 10% significance levels. Assume that the prices are normally distributed with the same variance (unknown) in each city.

City A City В

n |
18 |
9 |

X |
10 |
9 |

n — x)2 |
2 |
2 |

26. (Section 9.6) |

The following data are from an experiment to study the effect of training on the duration of unemployment. Let X be the duration of unemployment for those without training, and Y be the duration for those with training:

x 35 42 17 55 24

у 31 37 21 10 28

Assuming the two-sample normal model with equal variances, can we conclude that training has an effect at the 5% significance level? What about at 10%?

27. (Section 9.6)

The accompanying table shows the yields (tons per hectare) of a certain agricultural product in five experimental farms with and without an application of a certain fertilizer. Other things being equal, can we conclude that the fertilizer is effective at the 5% significance level? Is it at the 1% significance level? Assume that the yields are normally distributed.

Farm |
Weld without fertilizer (tons) |
Yield with fertilizer (tons) |

A |
5 |
7 |

В |
6 |
8 |

C |
7 |
7 |

D |
8 |
10 |

E |
9 |
10 |

28. (Section 9.6)

According to the Stanford Observer (October 1977), 1024 male students entered Stanford in the fall of 1972 and 885 graduated. Among the 1024 students were 84 athletes, of which 78 graduated. Would you conclude that the graduation record of athletes is superior to that of nonathletes at the 1% or 5% significance level?

29. (Section 9.6)

One pre-election poll, based on a sample of 5000 voters, showed Clinton ahead by 23 points, whereas another poll, based on a sample of 3000 voters, showed Clinton ahead by 20 points. Are the results significandy different at the 5% significance level? How about at 10%?

30. (Section 9.6)

Using the data of Exercise 26 above, test the equality of the variances at the 10% significance level.

31. (Section 9.6)

Using the data of Exercise 27 above, test the equality of the variances at the 10% significance level.

32. (Section 9.7)

Test the hypothesis Pi = P2 = p3 using the estimators p1; p2, and p3 having the joint distribution p. ~ N( p, A), where p’ = (pi, p2, p3), p’ = (p1; p2, p3), and

Г2 1 11 12 0- .1 0 1.

Assume that the observed values of p1; p2, and p3 are 4, 2, and 1, respectively. Choose the 5% significance level.

33. (Section 9.7)

There are three classes of five students each. The students all took the same test, and their test scores were as shown in the accompanying table. Assuming that the test scores are independently distributed as

9

)V(pi, a ) for class г = 1, 2, 3, testH0: Pi = p2 = p3 against Нг: not H0. Choose the size of the test to be 1% and 5%.

Score in |
||

Class 1 |
Class 2 |
Class 3 |

8.3 |
7.8 |
7.0 |

8.1 |
7.3 |
6.8 |

7.3 |
7.0 |
6.7 |

7.3 |
6.6 |
5.8 |

7.0 |
6.3 |
5.7 |

34. (Section 9.7)

In Group 1, rj of щ students passed a test; in Group 2, r2 of w2 students passed the test. Students are homogeneous within each group. Let pi and p2 be the probability that a student in Group 1 and in Group 2,

respectively, passes the test. Assume that the test results across the students are independent. We are to test H0: p = рч — 0.5 against H. not H0.

(a) Using the asymptotic normality of рл = гх/щ and p2 = г2/щ,

derive the Wald test for the problem. Given щ = 20, = 14, щ =

40, and r2 = 16, should you reject H0 at a = 0.05 or at a = 0.1?

(b) Derive the likelihood ratio test for the problem. Use it to answer problem (a) above.

35. (Section 9.7)

In Exercise 25 above, add one more column as follows:

City C

n 9

n — x)2 3

Test the hypothesis that the mean prices in the three cities are the same.

## Leave a reply