# Various Modes of Convergence

In this section, we shall define four modes of convergence for a sequence of random variables and shall state relationships among them in the form of several theorems.

Definition 3.2.1 (convergence in probability). A sequence of random variables (X„) is said to converge to a random variable X in probability if – limB_„ P(X„ — 2f| > €) = 0 for any € > 0. We write X„^*Xot plim X„ = X.

Definition 3.2.2 (convergence in mean square). A sequence (ЛГЯ) is said to converge to X in mean square if lim„_„ E(X„ — X)2 = 0. We write Xn X.

Definition 3.2.3 (convergence in distribution). A sequence {X„} is said to converge to X in distribution if the distribution function F„ ofX„ converges to the distribution function F of X at every continuity point of F. We write X„ X, and we call Fthe limit distribution of {.Y„}. If (Хя) and {Y„} have the

same limit distribution, we write X„=Yn.

The reason for adding the phrase “at every continuity point of F" can be understood by considering the following example: Consider the sequence

F„( •) such that

= 1, a + — < x. n

Then lim F„ is not continuous from the left at a and therefore is not a distribution function. However, we would like to say that the random variable with the distribution (3.2.1) converges in distribution to a degenerate random variable which takes the value a with probability one. The phrase “at every continuity point of F" enables us to do so.

Definition 3.2.4 (almost sure convergence). A sequence {Xn} is said to converge to X almost surely8 if

P{o)|Um X„{(o) = X(a>)) = 1.

Л-toe

We write X„ ^ X.

The next four theorems establish the logical relationships among the four modes of convergence, depicted in Figure 3.2.9

Theorem 3.2.1 (Chebyshev). EX —* 0 =» Xn 0.

Proof. We have

(3.2.2)

a. s.

M—— ► P ———– ► d

Figure 3.2 Logical relationships among four modes of convergence

where S = {jc|x2 Ш e2}. But we have

(3.2.4)

The theorem immediately follows from (3.2.4).

The inequality (3.2.4) is called Chebyshev’s inequality. By slightly modifying the proof, we can establish the following generalized form of Chebyshev’s inequality:

(3.2.5)

where g( •) is any nonnegative continuous function.

Note that the statement X„ X => Xn X, where X may be either a constant or a random variable, follows from Theorem 3.2.1 if we regard^ — Xas the X„ of the theorem.

We shall state the next two theorems without proof. The proof of Theorem

3.2.2 can be found in Mann and Wald (1943) or Rao (1973, p. 122). The proof of Theorem 3.2.3 is left as an exercise.

Theorem 3.2.2. X„ ** X=> X„ -4 X.

Theorem 3.2.3. X„ X=> X„ ** X.

The converse of Theorem 3.2.2 is not generally true, but it holds in the special case where X is equal to a constant a. We shall state it as a theorem, the proof of which is simple and left as an exercise.

Theorem 3.2.4. X„ a => X„ a.

The converse of Theorem 3.2.3 does not hold either, as we shall show by a well-known example. Define a probability space (Q, Л, P) as follows: Q = [0, 1 ], Л = Lebesgue-measurable sets in [0, 1 ], and P = Lebesgue measure as

in Example 3.1.2. Define a sequence of random variables Xn{a>) as

II /■—N. 3 w |
for 0 ё со ё 1 |

*2(<y) = 1 |
for 0 = |

= 0 |
elsewhere |

X3(oj) = 1 |
for – = + – 2 2 3 |

= 0 |
elsewhere |

XA((o) = 1 |
for 0 ё со ё and і 12 2 |

= 0 |
elsewhere |

X5((o) = 1 |
for— £ (0 S—— 1- 12 12 5 |

= 0 |
elsewhere |

In other words, the subset of Q over which X„ assumes unity has the total length 1 /я and keeps moving to the right until it reaches the right end point of [0, 1 ], at which point it moves back to 0 and starts again. For any 1 >c>0,we clearly have

P(Xn>e) = ±

and therefore X„ 0. However, because j-1 = », there is no element in Q for which lim„_„ Хп(а>) = 0. Therefore P{<y|lim„_00 X„(a>) = 0} = 0, implying that Xn does not converge to 0 almost surely.

The next three convergence theorems are extremely useful in obtaining the asymptotic properties of estimators.

Theorem 3.2.5 (Mann and Wald). Let X„ and X be AT-vectors of random variables and let g( •) be a function from RK to R such that the set E of discontinuity points of g( •) is closed and P(X є E) = 0. If X„ X, then g(X„)^g(X).

A slightly more general theorem, in which a continuous function is replaced by a Borel measurable function, was proved by Mann and Wald (1943). The

convergence in distribution of the individual elements of the vector X„ to the corresponding elements of the vector X is not sufficient for obtaining the above results. However, if the elements of X„ are independent for every n, the separate convergence is sufficient.

Theorem 3.2.6. Let X„ be a vector of random variables with a fixed finite number of elements. Let# be a real-valued function continuous at a constant vector point a. Then Xn^*a=> g(X„) g(a).

Proof. Continuity at a means that for any є > 0 we can find S such that IIX„ — all < d implies |#(XB) — g(a)l < e. Therefore

P[IIX„ – all < S s P[|#(XB) – g(a)I < e]. (3.2.6)

The theorem follows because the left-hand side of (3.2.6) converges to 1 by the assumption of the theorem.

Theorem 3.2.7 (Slutsky). If Xn X and Y„ a, then

(i) XH+Y„±*X+ot,

(ii) XnYn±aX,

(iii) (X„/Y„) X/a, provided а Ф 0.

The proof has been given by Rao (1973, p. 122). By repeated applications of Theorem 3.2.7, we can prove the more general theorem that ifg is a rational function and plim Yin = a,, і =1,2,. . . , J, and Хы ^jointly in all / = 1,2,. . . , K, then the limit distribution ofg(Xln, Xjn, • • • yXgj,, Ytn, Y2„,

. . . , Yj„) is the same as the distribution of g(X,,X2,. . . ,XK, al, a2,

. . . , otj). This is perhaps the single most useful theorem in large sample theory.

The following definition concerning the stochastic order relationship is useful (see Mann and Wald, 1943, for more details).

Definition 3.2.5. Let {X„} be a sequence of random variables and let {a„} be a sequence of positive constants. Then we can write Xn = o(an) if plim„_„ a~lX„ = 0 and X„ = 0(a„) if for any e > 0 there exists an Mt such that

P[a-‘Xn^Mt}lz 1-е for all values of n.

Sometimes these order relationships are denoted op and Op respectively to distinguish them from the cases where {X„} are nonstochastic. However, we use the same symbols for both stochastic and nonstochastic cases because Definition 3.2.5 applies to the nonstochastic case as well in a trivial way.

## Leave a reply