Applications to Statistical Inference under Normality

5.6.1. Estimation

Statistical inference is concerned with parameter estimation and parameter in­ference. The latter will be discussed next in this section.

In a broad sense, an estimator of a parameter is a function of the data that serves as an approximation of the parameter involved. For example, if X1, X2,Xn is a random sample from the N(д, a2)-distribution, then the sample mean X = (1 /n) Хд = Xj may serve as an estimator of the un­known parameter д (the population mean). More formally, given a data set {X1, X2,.Xn } for which the joint distribution function depends on an un­known parameter (vector) в, an estimator of в is a Borel-measurable function 6 = gn (X1,.Xn) of the data that serves as an approximation of в .Of course, the function gn should not itself depend on unknown parameters.

In principle, we can construct many functions of the data that may serve as an approximation of an unknown parameter. For example, one may consider using X1 only as an estimator of д. How does one decide which function of the data should be used. To be able to select among the many candidates for an estimator, we need to formulate some desirable properties of estimators. The first one is “unbiasedness”:

Definition 5.3: An estimator в of a parameter (vector) в is unbiased if E [в] = в.

The unbiasedness property is not specific to a particular value of the parameter involved but should hold for all possible values of this parameter in the sense that if we draw a new data set from the same type of distribution but with a different parameter value, the estimator should stay unbiased. In other words, if the joint distribution function of the data is Fn (xj, …,xnв), where в є © is an unknown parameter (vector) in a parameter space © (i. e., the space of all possible values of в), and в = gn(Xj, Xn) is an unbiased estimator of в, then / gn (x,…, xn )dFn (x, …,xnв) = в for all в є ©.

Note that in the preceding example both X and Xj are unbiased estimators of /л. Thus, we need a further criterion in order to select an estimator. This criterion is efficiency:

Definition 5.4: An unbiased estimator в of an unknown scalar parameter в is efficient if, for all other unbiased estimators в, var(9) < var(B). In the case in which в is a parameter vector, the latter reads: Var(B) — Var(9) is a positive semidefinite matrix.

In our example, Xj is not an efficient estimator of л because var(Xj) = a2 and var(X) = a2/n. But is X efficient? To answer this question, we need to derive the minimum variance of an unbiased estimator as follows. For notational convenience, stack the data in a vector X. Thus, in the univariate case X = (Xj, X2,…, Xn)T, and in the multivariate case X = (XT,…, XT)T. Assume that the joint distribution of X is absolutely continuous with density fn(x в), which for each x is twice continuously differentiable in в. Moreover, let 9 = gn (X) be an unbiased estimator of в. Then

j gn (x )fn (xв )dx = в. (5.8)

Furthermore, assume for the time being that is a scalar, and let

d f f d

J gn(x)fn(xв)dx = J gn(x)—fn(xв)dx. (5.9)

Conditions for (5.9) can be derived from the mean-value theorem and the dom­inated convergence theorem. In particular, (5.9) is true for all в in an open set © if

jgn(x)sup9e©d2 fn(xв)/^в)2dx < TO.   Then it follows from (5.8) and (5.9) that

 (5.10) Similarly, if d їв j fn (x |в)dx = d defn (x |в )dx, (5.11) which is true for all в in an open set © for which/ sup,, e0|d2 fn (x |в )/(ї, )2|dx < то, then, because / fn(x |в)dx = 1, we have / Г d 1 ~Tf ln(fn(x |в)) d II 4> 4 d —fn (x |в )dx = 0. d (5.12)
 gn (x )

 d в

 ln(fn(x |в))

 fn (x |в )dx =

If we let в = d ln(fn(X|в))/dв, it follows now from (5.10) that E[в ■ в] = 1 and from (5.12) that E[/] = 0. Therefore, cov(0, в) = E[в ■ в] — E[в]E[в] = 1. Because by the Cauchy-Schwartz inequality, |cov(^, /3)| <  Jvar(e)J var(/), we now have that var(0) > 1 /var(/):

This result is known as the Cramer-Rao inequality, and the right-hand side of (5.13) is called the Cramer-Rao lower bound. More generally, we have the following:

Theorem 5.11: (Cramer-Rao) Let fn(x |в) be the joint density of the data stacked in a vector X, where в is a parameter vector. Let в be an unbiased estimator of в. Then Var(ff) = (E [(9 ln(fn (X |в )/дв T)(9 ln(fn (X |в )/дв)])—1 + D, where D is a positive semidefinite matrix.

Now let us return to our problem of whether the sample mean X of a ran­dom sample from the N(y, a2) distribution is an efficient estimator of y. In this case the joint density of the sample is fn (x |y, a2) = П n =1 exp(— 1(xj — y)2/a 2)/Va22n; hence, d ln(fn (X|y, a 2))/dy = YTj=1(Xj — y)/a 2,and thus the Cramer-Rao lower bound is

12

———————————– = a 2/n. (5.14)

E [(9 lnf (X |y, a 2)) /9y)2]

This is just the variance of the sample mean X; hence, X is an efficient estimator of y. This result holds for the multivariate case as well:

Theorem 5.12: Let X1, X2,Xn be a random sample from the Nk [д, £] distribution. Then the sample mean X = (1/n)Jfj=1 Xj is an unbiased and efficient estimator of д.

The sample variance of a random sample X1, X2Xn from a univariate distribution with expectation д and variance a2 is defined by

n

S2 = (1/(n – 1))£(Xj – X)2, (5.15)

j=1

which serves as an estimator of a2. An alternative form of the sample variance is

a2 = (1/n)V(X, – X)2 = —1 s2, (5.16)

n

j=1 n

but as I will show for the case of a random sample from the N (д, a2) distribu­tion, (5.15) is an unbiased estimator and (5.16) is not:

Theorem 5.13: Let S2 be the sample variance of a random sample X1,Xn from the N(д, a2) distribution. Then (n – 1)S2/a2 is distributed x2-y

The proof of Theorem 5.13 is left as an exercise. Because the expectation of the x2— 1 distribution is n – 1, this result implies that E(S2) = a2, whereas by (5.16), E (a2) = a 2(n – 1)/n. Moreover, given that the variance ofthe x2-1 distribution is 2(n – 1), it follows from Theorem 5.13 that

var(S2) = 2a4/(n – 1). (5.17)

The Cramer-Rao lower bound for an unbiased estimator of a2 is 2a 4/n; thus, S2 is not efficient, but it is close if n is large.

For a random sample X1, X2,…, Xn from a multivariate distribution with expectation vector д and variance matrix £ the sample variance matrix takes the form

n

£ = (1/(n – 1))J2(Xj – X)(Xj – X)T (5.18)

j=1

This is also an unbiased estimator of £ = Var(Xj) even if the distribution involved is not normal.