# Best Linear Unbiased Estimator

Neither of the two strategies discussed in Section 7.2.4 is the primary strategy of classical statisticians, although the second is less objectionable to them. Their primary strategy is that of defining a certain class of estimators within which we can find the best estimator in the sense of Definition 7.2.1. For example, in Example 7.2.1, if we eliminate W and Z from our consideration, T is the best estimator within the class consisting of only T and S. A certain degree of arbitrariness is unavoidable in this strategy. One of the classes most commonly considered is that of linear unbiased estimators. We first define

DEFINITION 7.2.4 0 is said to be an unbiasedestimator of 0 if £0 = 0 for

all 0 Є 0. We call £0-0 bias.

Among the three estimators in Example 7.2.1, T and S are unbiased and W and Z are biased. Although unbiasedness is a desirable property of an estimator, it should not be regarded as an absolutely necessary condition. In many practical situations the statistician prefers a biased estimator with a small mean squared error to an unbiased estimator with a large mean squared error.

Theorem 7.2.10 gives a formula which relates the bias to the mean squared error. This formula is convenient when we calculate the mean squared error of an estimator.

THEOREM 7.2.10 The mean squared error is the sum of the variance and the bias squared. That is, for any estimator 0 of 0,

(7.2.6) Е(в ~ 0)2 = У0 + (Ев ~ в)2.

Proof. It follows from the identity

(7.2.7) Е(в – 0)2 = Е[ф – Ев) + (Ев – 0)]2

= Еф – Ев)2 + (Ев – 0)2.

Note that the second equality above holds because £[(0 — Ев) (Ев — 0)] = (£0 – в)Е(в – Ев) = 0. □

In the following example we shall generalize Example 7.2.1 to the case of a general sample of size n and compare the mean squared errors of the generalized versions of the estimators T and Z using Theorem 7.2.10.

EXAMPLE 7.2.2

Population: X = 1 with probability p,

= 0 with probability 1 — p.

Sample: (Xj, X2,. . . , Xn).

Estimators: T = X

71

+1

ry __ t 1

n + 2

Since ET = p, we have

(7.2.8) MSE(T) = VT = p(l ~ – P) ,  where MSE stands for mean squared error. We have

(7.2.10) VZ = ^^ •

(n + 2 f

Therefore, using Theorem 7.2.10, we obtain

(7.2.11) MSE(Z) = ^^ + 1 •

(n + 2)2

From (7.2.8) and (7.2.11) we conclude that MSE(Z) < MSE(T) if and only if I_I JjL±l + . 2 2 V 2w + 1 y 2 2 V 2n + 1

As we stated in Section 7.1.1, the sample mean is generally an unbiased estimator of the population mean. The same cannot necessarily be said of all the other moments defined in that section. For example, the sample variance defined there is biased, as we show in (7.2.13). We have (7.2.13) £X№-^)2 = £X – P) – (X – (jl) Г І= 1

= X E[(Xt – (Г)2 + (X – p)2 – 2(X{ – p)(X – (r)]

І= 1

Ґ 2

a2 + — — 2 — ^ n П J

= (n — 1)(T.

Therefore ESx = (n — 1)ст2/и. For this reason some authors define the sample variance by dividing the sum of squares by n — 1 instead of n to produce an unbiased estimator of the population variance.

The class of linear estimators consists of estimators which can be ex- The unbiasedness condition (7.2.14) implies a, = 1. Therefore, noting that the left-hand side of (7.2.18) is the sum of squared terms and hence nonnegative, we obtain

(7.2.19) X ~ ~ ■ n

г= 1

The equality in (7.2.19) clearly holds if and only if a, = 1 /n. Therefore the theorem follows from (7.2.16), (7.2.17), and (7.2.19). □ (Note that we could define the class of linear estimators as a® + Z"=1e, X, with a constant term. This would not change the theorem, because the unbiasedness condition (7.2.14) would ensure that oq = 0.) We now know that the dominance of T over S in Example 7.2.1 is merely a special case of this theorem.

From a purely mathematical standpoint, Theorem 7.2.11 provides the solution to minimizing Х”=1о* with respect to (a,) subject to condition E’Lja,- = 1- We shall prove a slightly more general minimization problem, which has a wide applicability.

THEOREM 7.2.12 Consider the problem of minimizing with re­

spect to at)

Proof. Consider the identity  (7.2.20) X

where we used the condition Z"=i афі = 1 to obtain the second equality. The theorem follows by noting that the left-hand side of the first equality of (7.2.20) is the sum of squares and hence nonnegative. □
(Theorem 7.2.11 follows from Theorem 7.2.12 by putting Ьг = 1 for all i.) We shall give two examples of the application of Theorem 7.2.12.

EXAMPLE 7.2.3 Let Xt be the return per share of the ith stock, і = 1, 2, . . . , n, and let q be the number of shares of the ith stock to purchase. Put EXl = іXi and VX, = cr2. Determine сг so as to minimize У(Е”=]сгХг) subject to M = Е™=ісгр„ where M is a known constant. Assume that Xt are uncorrelated.   If we put at = ct<jl and Ьг = |хг/ {Миі) , this problem is reduced to the minimization problem of Theorem 7.2.12. Therefore, the solution is

2

That is, c, is proportional to |хг/a,.

example 7.2.4 Let 0;, і = 1, 2, . . . , n, be unbiased estimators of 0 with variances of, і = 1, 2, . . . , n. Choose (cj so that Хги=1сг0г is unbiased and has a minimum variance. Assume that 0, are uncorrelated.

Since the unbiasedness condition is equivalent to the condition Z”= ] c, = 1, the problem is that of minimizing £™=1cfof subject to = 1. Thus it is a special case of Example 7.2.3, where p, = 1 and M = 1. Therefore the solution is c, = a, 2/L”=1ct, 2-

Theorem 7.2.11 shows that the sample mean has a minimum variance (and hence minimum mean squared error) among all the linear unbiased estimators. We have already seen that a biased estimator, such as W and Z of Example 7.2.1, can have a smaller mean squared error than the sample mean for some values of the parameter. Example 7.2.5 provides a case in which the sample mean is dominated by an unbiased, nonlinear estimator.

EXAMPLE 7.2.5 for 0 < x < 0, otherwise.

Sample: (Хъ X2,. . . , Xn).

Parameter to estimate: їх = —.

^ 2 Estimators: jii = X

An intuitive motivation for the second estimator is as follows: Since 0 is the upper bound of X, we know that Z ^ 0 and Z approaches 0 as и increases. Therefore it makes sense to multiply Z by a factor which is greater than 1 but decreases monotonically to 1 to estimate 0. More rigorously, we shall show in Example 7.4.5 that jig is the bias-corrected maximum likelihood estimator.

We have EX2 = 0-1/ox2dx = 02/3. Therefore VX = 02/12. Hence

q2

(7.2.22) MSECpn) = EX = — •

12n Let G(z) and g(z) be the distribution and density function of Z, respec­tively. Then we have for any 0 < z < 0, Differentiating (7.2.23) with respect to z, we obtain  Using (7.2.24), we can calculate  and  Therefore Since (7.2.25) shows that jx2 is an unbiased estimator, we have, using (7.2.27), Comparing (7.2.22) and (7.2.27), we conclude thatMSE^) with equality holding if and only if n = 1.