Various Measures of Closeness
The ambiguity of the first kind is resolved once we decide on a measure of closeness between the estimator and the parameter. There are many reasonable measures of closeness, however, and it is not easy to choose a particular one. In this section we shall consider six measures of closeness and establish relationships among them. In the following discussion we shall denote two competing estimators by X and F and the parameter by 0. Note that 0 is always a fixed number in the present analysis. Each of the six statements below gives the condition under which estimator X is preferred to estimator Y. (We allow for the possibility of a tie. If X is preferred to Y and Y is not preferred to X, we say X is strictly preferred to F.) Or, we might say, X is “better” than F. Adopting a particular measure of closeness is thus equivalent to defining the term better. (The term strictly better is defined analogously.)
(1) P(X – 0 < F – 0) = 1.
(2) Eg(X — 0) < Eg(Y — 0) for every continuous function g() which is nonincreasing for x < 0 and nondecreasing for x > 0.
(3) Eg(X — 0) ^ Eg(Y — 0) for every continuous and nondecreasing function g.
(4) P(X – 0 > e) < P(F – 0 > e) for every e.
(5) E(X – 0)2 < E(Y – 0)2.
(6) P(X — 0 < F — 0) > P(X — 0 > F — 0).
Criteria (1) through (5) are transitive; (6) is not. The reader should verify this. Criteria (3) and (4) are sometimes referred to as universal dominance and stochastic dominance, respectively; see Hwang (1985). The
0 I 1
FIGURE 7.2 Illustration for Theorem 7.2.4
idea of stochastic dominance is also used in the finance literature; see, for example, Huang and Litzenberger (1988).
THEOREM 7.2.1 (2) => (3) and (3) y* (2). (Obvious.)
THEOREM 7.2.2 (3) => (5) and (5) (3). (Obvious.)
THEOREM 7.2.В (3) <=> (4).
Sketch of Proof. Define
/te(z) = 1 if z ^ e,
= 0 otherwise.
Then Ehe(X — 0) = P(X — 0 > e). Therefore, (4) is equivalent to stating that Eh^{X — 0) < EhfY — 0) for every є. The theorem follows from the fact that a continuous function can be approximated to any desired degree of accuracy by a linear combination of step functions. (See Hwang, 1985, for a rigorous proof.) О
THEOREM 7.2.4 (4) ? (6), meaning that one does not imply the other.
Proof Consider Figure 7.2. Here X (solid line) and Y (dashed line) are two random variables defined over the sample space [0, 1]. The probability distribution defined over the sample space is assumed to be such that the probability of any interval is equal to its length. We also assume that 0 = 0. Then, by our construction, X is strictly preferred to Y by criterion (4), whereas Y is stricdy preferred to X by criterion (6). □
THEOREM 7.2.5 (1) => (3) and (3) ф (1).
Proof.
(1) => (3). Since g is nondecreasing, X — 0 < F — 0 => g(X — 0) < ff(F – 0). Thus, 1 = P(X – 0 < Y ~ 0) <= P[g(X – 0) < g(Y ~ 0)] Therefore, _Eg(X — 0) ^ jFg(F — 0) for every continuous and nondecreasing function g.
(3) ^ (1). Consider X and Y, defined in Figure 7.2. We have shown that X is preferred to Y by criterion (4). Therefore, X is preferred to Y by criterion (3) because of Theorem 7.2.3. But P(X — 0 < F — 0) = P(X < Y) < 1. □
THEOREM 7.2.6 (1) => (6) and (6) ^ (1).
Proof.
(1) => (6). The righthand side of (6) is zero if (1) holds. Then (6) must hold.
(6) ^ (1). Consider X and F, defined in Figure 7.2. Clearly F is preferred to X by criterion (6), but P(Y — 0 < X — 0) = P(Y < X) < 1. □
THEOREM 7.2.7 (1) ? (2).
Proof. Consider estimators S and T in Example 7.2.1 when p = 3/4. Then T is preferred to S by criterion (1). Define a function go in such a way that go(—%) = go(~lA) = 1 and goCA) = Уз – Then T is not preferred to S by criterion (2), because Eg0(S — p) < Ego(T — p). This shows that (1) does not imply (2). Next, consider X and F, defined in Figure 7.2. Since X – 0 > 0 and F — 0 > 0 in this example, criteria (2) and (3) are equivalent. But, as we noted in the proof of Theorem 7.2.5, X is preferred to F by criterion (3). Therefore X is preferred to F by criterion (2). But clearly X is not preferred to F by criterion (1). This shows that (2) does not imply (1). □
figure 7.3 Illustration for Theorem 7.2.9
THEOREM 7.2.8 (2) ? (6).
Proof. Consider any pair of random variables X and Y such that X — 0 > 0 and Y — 0 > 0. Then, as already noted, (2) and (3) are equivalent. But (3) and (4) are equivalent by Theorem 7.2.3, and (4) ? (6) by Theorem 7.2.4. □
THEOREM 7.2.9 (5) ? (6).
Proof. In Figure 7.3, X (solid line) and Y (dashed line) are defined over the same sample space as in Figure 7.2, and, as before, we assume that 0 = 0. Then X is strictly preferred to Y by criterion (6). But E(X — 0) = 4 + % and E(Y — 0)2 = 4; therefore Y is strictly preferred to X by criterion (5). □
The results obtained above are summarized in Figure 7.4. In the figure, an arrow indicates the direction of an implication, and a dashed line between a pair of criteria means that one does not imply the other.
(1)————————————— (2)
(5)
Although all the criteria defined in Section 7.2.2 are reasonable (except possibly criterion (6), because it is not transitive), and there is no a priori reason to prefer one over the others in every situation, statisticians have most frequently used criterion (5), known as the mean squared error. We shall follow this practice and define the term better in terms of this criterion throughout this book, unless otherwise noted.
л л
If 0 is an estimator of 0, we call £(0 — 0) the mean squared error of the estimator. By adopting the mean squared error criterion, we have eliminated (though somewhat arbitrarily) the ambiguity of the first kind (see the end of Section 7.2.1). Now we can rank estimators according to this criterion though there may still be ties, for each value of the parameter. We can easily calculate the mean squared errors of the three estimators in Example 7.2.1: E(T – %)2 = E(S – 3/4)2 = 3/i6, and E(W – %)2 = Vi6. Therefore, for this value of the parameter p, W is the best estimator.
The ambiguity of the second kind remains, however, as we shall illustrate by referring again to Example 7.2.1. The mean squared errors of the three estimators as functions of p are obtained as
(7.2.1) E(T – pf = ^ p{ 1 – p),
(7.2.2) E(S ~ pf = p(l – p),

They are drawn as three solid curves in Figure 7.5. (Ignore the dashed curve, for the moment.) It is evident from the figure that T clearly dominates 5 but that T and W cannot be unequivocally ranked, because T is better for some values of p and W is better for other values of p. When T dominates S as in this example, we say that T is better than 5. This should be distinguished from the statement that T is better than 5 at a specific value of p. More formally, we state
DEFINITION 7.2.1 Let X and Y be two estimators of 0. We say X is better (or more efficient) than Y if E(X — 0)2 < E{Y — 0)2 for all 0 Є 0 and E(X — 0)2 < E(Y — 0)2 for at least one value of 0 in 0. (Here 0 denotes the parameter space, the set of all the possible values the parameter can take. In Example 7.2.1, it is the closed interval [0, 1].)










0.89
figure 7.5 Mean squared errors of estimators in Example 7.2.1
When an estimator is dominated by another estimator, as in the case of 5 by T in the above example, we say that the estimator is inadmissible.
DEFINITION 7.2.2 Let 0 be an estimator of 0. We say that 0 is inadmissible if there is another estimator which is better in the sense of Definition 7.2.1. An estimator is admissible if it is not inadmissible.
Thus, in Example 7.2.1, S is inadmissible and T and W are admissible. We can ignore all the inadmissible estimators and pay attention only to the class of admissible estimators.
Leave a reply