Applications of the Uniform Weak Law of Large Numbers

6.4.2.1. Consistency of M-Estimators

Chapter 5 introduced the concept of a parameter estimator and listed two desir­able properties of estimators: unbiasedness and efficiency Another obviously See Appendix II.

desirable property is that the estimator gets closer to the parameter to be esti­mated if we use more data information. This is the consistency property:

Definition 6.5: An estimator в of a parameter в, based on a sample of size n, is called consistent ifplimn^XlB = в.

Theorem 6.10 is an important tool in proving consistency of parameter esti­mators. A large class of estimators is obtained by maximizing or minimizing an objective function of the form(1/n) nj=1 g(Xj, в), where g, Xj, and в are the same as in Theorem 6.10. These estimators are called M-estimators (where the M indicates that the estimator is obtained by Maximizing or Minimizing a Mean of random functions). Suppose that the parameter of interest is в0 = argmax0 є© E[g(X 1 ,в)], where © is a given closed and bounded set. Note that “argmax” is a shorthand notation for the argument for which the function involved is maxi­mal. Then it seems a natural choice to use в = argmax0 e©(1/n) =1 g(Xj, в)

as an estimator of в0. Indeed, under some mild conditions the estimator involved is consistent:

Theorem 6.11: (Consistency of M-estimators) Let в = argmax6 є© Q^), в0 = argmaxee© Q^), where Q^) = (1/n)J^=1 g(Xj, в), and Q^) = E[Q^)] = E[g(Xь в)], with g, Xj, and в the same as in Theorem 6.10. If в0 is unique, in the sense that for arbitrary є > 0 there exists a 8 > 0 such that Q^o) — suP\0—вл>є 6(в) > 8,5 thenPlimn^J> = 00.

Proof: First, note that в є © and в0 є © because g(x, в) is continuous in в. See Appendix II. By the definition of во,

0 < еШ — ^(в) = еШ — еШ + еШ — Є(в")

< бШ — еШ + Є(в) — 6(0) < 2sup |Є(в) — Q^)|, (6.8)

and it follows from Theorem 6.3 that the right-hand side of (6.8) converges in probability to zero. Thus,

plim еф) = Q(00). (6.9)

Moreover, the uniqueness condition implies that for arbitrary є > 0 there exists a 8 > 0 such that Q^o) — бв > 8 if ||в — в0|| > є; hence,

P(||в — 00і| > є) < P(Q^) — бф) > 8). (6.10)

5 It follows from Theorem II.6 in Appendix II that this condition is satisfied if © is compact and Q is continuous on ©.

Combining (6.9) and (6.10), we find that the theorem under review follows from Definition 6.1. Q. E.D.

It is easy to verify that Theorem 6.11 carries over to the “argmin” case simply by replacing g by – g. As an example, let XjXn be a random sample from the noncentral Cauchy distribution with density h(x |в0) = 1/[n(1 + (x – во)2] and suppose that we know that в0 is contained in a given closed and bounded interval ©. Let g(x, в) = f (x – в), where f (x) = exp(-x2/2) Д/2ж is the density of the standard normal distribution. Then,

TO j f (x – в + в0^^ |в)dx = y(в – в0),

-TO

for instance, where y (y) is a density itself, namely the density of Y = U + Z, with U and Z independent random drawings from the standard normal and standard Cauchy distribution, respectively. This is called the convolution of the two densities involved. The characteristic function of Y is exp(-|t| – t2/2), and thus by the inversion formula for characteristic functions

TO

Y(y) = 2П f c°s(t ■ y)exp(-|t| – 12/2)dt. (6.12)

-TO

This function is maximal in y = 0, and this maximum is unique because, for fixed y = 0, the set {t є К : cos(t ■ y) = 1} is countable and therefore has Lebesgue measure zero. In particular, it follows from (6.12) that, for arbitrary

є > 0,

sup y(y) < f sup |cos(t ■ y)| exp(-|t| – 12/2)dt < y(0).

|y|>є 2n J |y|>

-TO

(6.13)

Combining (6.11) and (6.13) yields sup^^^ E[g(X1,в)] < EgX^)]. Thus, all the conditions of Theorem 6.11 are satisfied; hence, plimn^TO() = в0.

Another example is the nonlinear least-squares estimator. Consider a ran­dom sample Zj = (Yj, XT )t, j = 1, 2,…,n with Yj є К, Xj є Rk and as­sume that

Assumption 6.1: Foragiven function f (x, в) on Kk x ©, with © a given com­pact subset ofRm, there exists a в0 є © such that P [E[Yj |Xj] = f (Xj, в0)] =
1. Moreover, for each x є Kk, f (x, в) is a continuous function on ©, and for each в є ©, f (x, в) is a Borel-measurable function on Kk. Furthermore, let E [Y2] < to, E[supeє© f (X1, в)2] < то, and

inf E [(f(Xі, в) – f (Xі, во))2] > 0 for 8> 0.

ІІв-воіі>а

Letting Uj = Yj – E[Yj |Xj], we can write

Yj = f (Xj, в0) + Uj, where P(E[Uj |Xj] = 0) = 1. (6.14)

This is the general form of a nonlinear regression model. I will show now that, under Assumption 6.1, the nonlinear least-squares estimator

n

в = argmin(1/n) J](Yj – f(Xj, в))2 (6.15)

is a consistent estimator of в0.  Let g(Zj, в) = (Yj – f (Xj, в))2. Then it follows from Assumption 6.1 and Theorem 6.10 that

Moreover,

E[g(Z1, в)] = E [(Uj + f (Xj, в0) – f (Xj, в))2]

= E [Uj] + 2E[E(Uj |Xj)(f (Xj, в0) – f (Xj, в))]

+ E [(f (Xj в – f (Xj, в))2]

= E [Uj] + E [(f(Xj, в0) – f (Xj, в))2];

hence, it follows from Assumption 6.1 that _в„||>8E[|g(Z1, в)|] > 0 for

8 > 0. Therefore, the conditions of Theorem 6.11 for the argmin case are satisfied, and, consequently, the nonlinear least-squares estimator (6.15) is consistent.