Cram6r-Rao Lower Bound
The Cram6r-Rao lower bound gives a useful lower bound (in the matrix sense) for the variance-covariance matrix of an unbiased vector estimator.4 In this section we shall prove a general theorem that will be applied to Model 1 with normality in the next two subsections.
Theorem 1.3.1 (Cramer-Rao). Let z be an n-component vector of random variables (not necessarily independent) the joint density of which is given by L(z, в), where в is a АГ-component vector of parameters in some parameter space 6. Let 0(z) be an unbiased estimator of в with a finite variance-covariance matrix. Furthermore, assume that L(z, в) and 6(z) satisfy
d2 log L _ d log L d log L двдв’ Ь дв дв’ ’
Note: d log L/дв is a АГ-vector, so that 0 in assumption A is a vector ofК zeroes. Assumption C means that the left-hand side is a positive definite matrix. The integral in assumption D is an «-tuple integral over the whole domain of the Euclidean «-space because z is an «-vector. Finally, I in assumption D is the identity matrix of size K.
Then we have for в Є 0
Proof. Define Р = Е(в-Є)(в-ву, Q = E($- в)(д log L/дв’), and R = E(d log Ь/дв)(д log L/дв’). Then
because the left-hand side is a variance-covariance matrix. Premultiply both sides of (1.3.9) by [I, — QR~1 ] and postmultiply by [I, — QR-1 ]’. Then we get
P – QR-’Q’ > 0,
where R 1 can be defined because of assumption C. But we have
= I by assumption D.
Therefore (1.3.8) follows from (1.3.10), (1.3.11), and assumption B.
The inverse of the lower bound, namely, —Ed2 log L/двдв’, is called the information matrix. R. A. Fisher first introduced this term as a certain measure of the information contained in the sample (see Rao, 1973, p. 329).
The Cramer-Rao lower bound may not be attained by any unbiased estimator. If it is, we have found the best unbiased estimator. In the next section we shall show that the maximum likelihood estimator of the regression parameter attains the lower bound in Model 1 with normality.
Assumptions A, B, and D seem arbitrary at first glance. We shall now inquire into the significance of these assumptions. Assumptions A and В are equivalent to
respectively. The equivalence between assumptions A and A’ follows from
The equivalence between assumptions В and B’ follows from
Furthermore, assumptions A’, B’, and D are equivalent to
because the right-hand sides of assumptions A", B", and D’ are 0,0, and I, respectively. Written thus, the three assumptions are all of the form
in other words, the operations of differentiation and integration can be interchanged. In the next theorem we shall give sufficient conditions for (1.3.14) to hold in the case where в is a scalar. The case for a vector в can be treated in essentially the same manner, although the notation gets more complex.
Theorem 1.3.2. If (i) <3/(z, в)/дв is continuous in в є 0 and z, where 9 is an open set, (ii) //(z, 0) dz exists, and (iii) /|d/(z, 6)/dedz for all
в Є 0, then (1.3.14) holds.
Proof. We have, using assumptions (i) and (ii),
where в* is between в and 0 + h. Next, we can write the last integral as / = fл +1 a, where A is a sufficiently large compact set in the domain of z and A is its complement. But we can make fA sufficiently small because of (i) and sufficiently small because of (iii).