Convergence in Probability and the Weak Law of Large Numbers
Let Xn be a sequence of random variables (or vectors) and let X be a random or constant variable (or conformable vector).
Definition 6.1: We say that Xn converges in probability to X, also denoted as plimn^TO Xn = X or Xn ^ pX, if for an arbitrary є > 0 we have limn^TOP(|Xn — X| > є) = 0, or equivalently, limn^TOP(|Xn — X| < є) = 1.
In this definition, X may be a random variable or a constant. The latter case, where P(X = c) = 1 for some constant c, is the most common case in econometric applications. Also, this definition carries over to random vectors provided that the absolute value function |x | is replaced by the Euclidean norm
||x У = V x Tx.
The right panels of Figures 6.1-6.3 demonstrate the law of large numbers. One of the versions of this law is the weak law of large numbers (WLLN), which also applies to uncorrelated random variables.
Theorem 6.1: (WLLNfor uncorrelated random variables). Let X1Xn be a sequence of uncorrelated random variables with E (Xj) = /a and var(Xj) = a2 < to and let X = (1/n)Jf”=1 Xj. Thenplimn^TO ft = a.
Proof: Because E(ft) = a and var(X) = a2/n, it follows from Chebishev inequality that P(| ft — al > є) < a2/(ns2) ^ 0 if n ^to. Q. E.D.
The condition of a finite variance can be traded in for the i. i.d. condition:
Theorem 6.2: (The WLLN for i. i.d. random variables). LetX 1,Xn bease – quence ofindependent, identically distributed random variables with E [| Xj |] < to and E(Xj) = a, and let X = (1/n) =i Xj. Thenplimn^TOX = a.
Proof: Let Yj = Xj ■ I(|Xj | < j) and Zj = Xj ■ I(|Xj | > j), and thus
Xj = Yj + Zj. Then
= 2(1/n)J2 E[|X111(|X11 > j)] ^ 0,
= (1/n2)J2 E [X2 I(lX1|< j)]
= (1/n2) E E E[x 1 (k – 1 < IX11 < k)]
j = 1 k=1
< o/n2)EE k ■ E[|X1I ■ I(k – 1 < IX1I < k)]
n j-1 j
= (1/n2)EEEe[iX111(i -1 < іX11 < i)]
j=1 k=1 i =k n j — 1
< (1/n2^ E E[IX11 ■ I(|X11 > k – 1)]
< (1/n)J2 E [I X11 ■ I (I X11 > k – 1)] ^ 0 (6.2)
as n ^ to, where the last equality in (6.2) follows from the easy equality S^=1 k ■ ak = J2k-1 E=kai, and the convergence results in (6.1) and (6.2) follow from the fact that E[|X1 |I(|X11 > j)] ^ 0 for j ^to because E [| X11] < to. Using Chebishev’s inequality, we find that it follows now from (6.1) and (6.2) that, for arbitrary є > 0,
as n ^ to. Note that the second inequality in (6.3) follows from the fact that, for nonnegative random variables X and Y, P [X + Y > є] < P [X > є/2] + P [Y > є/2]. The theorem under review follows now from (6.3), Definition 6.1, and the fact that є is arbitrary. Q. E.D.
Note that Theorems 6.1 and 6.2 carry over to finite-dimensional random vectors Xj by replacing the absolute values H by Euclidean norms: \x У =
JxTx and the variance by the variance matrix. The reformulation of Theorems
6.1 and 6.2 for random vectors is left as an easy exercise.
Convergence in probability carries over after taking continuous transformations. This result is often referred to as Slutky’s theorem:
Theorem 6.3: (Slutsky S theorem). Let Xn a sequence of random vectors in Kk satisfyingXn ^p c, wherec is nonrandom. Let Ф(x) bean Rm -valuedfunction on Kk that is continuous in c. Then Ф(Xn) ^ p Ф (c).
Proof: Consider the case m = k = 1. It follows from the continuity of Ф that for an arbitrary є > 0 there exists a 8 > 0 such that |x – c|< 8 implies
|Ф(x) – Ф(c)| < є; hence,
P(|Xn – c|< 8) < P(|Ф(Xn) – Ф(c)| < є).
Because limn^TOP(|Xn – c|< 8) = 1, the theorem follows for the case under review. The more general case with m > 1, k > 1, or both can be proved along the same lines. Q. E.D.
The condition that c be constant is not essential. Theorem 6.3 carries over to the case in which c is a random variable or vector, as we will see in Theorem 6.7 below.
Convergence in probability does not automatically imply convergence of expectations. A counterexample is Xn = X + 1 /n, where Xhas a Cauchy distribution (see Chapter 4). Then E[Xn] and E(X) are not defined, but Xn ^pX. However,
Theorem 6.4: (Bounded convergence theorem) If Xn is bounded, that is, P(|Xn | < M) = 1 for some M < m and all n, then Xn ^pX implies limn^m E (Xn) = E (X).
Proof: First, X has to be bounded too, with the same bound M; otherwise, Xn ^pX is not possible. Without loss of generality we may now assume that P (X = 0) = 1 and that Xn is a nonnegative random variable by replacing Xn with |Xn – X| because E[|Xn – X|] ^ 0 implies limn^TO E(Xn) = E(X). Next, let Fn (x) be the distribution function of Xn and let є > 0 be arbitrary. Then
0 < E(Xn) = j xdFn(x)
= j xdFn(x) + j xdFn(x) < є + M ■ P(Xn > є).
Because the latter probability converges to zero (by the definition of convergence in probability and the assumption that Xn is nonnegative with zero probability limit), we have 0 < limsupn^TO E(Xn) < є for all є > 0; hence, lim^ E (Xn) = 0. Q. E.D.
The condition that Xn in Theorem 6.4 is bounded can be relaxed using the concept of uniform integrability:
Definition 6.2: A sequence Xn of random variables is said to be uniformly integrable iflimM^TO supn>1 E[|Xn | ■ I(|Xn | > M)] = 0.
Note that Definition 6.2 carries over to random vectors by replacing the absolute value |-| with the Euclidean norm ||-||. Moreover, it is easy to verify that if |Xn | < Y with probability 1 for all n > 1, where E(Y) < to, then Xn is uniformly integrable.
Theorem 6.5: (Dominated convergence theorem) Let Xn be uniformly integrable. Then Xn ^pX implies limn^TOE(Xn) = E(X).
Proof: Again, without loss of generality we may assume that P (X = 0) = 1 and that Xn is a nonnegative random variable. Let 0 < є < M be arbitrary. Then, as in (6.4),
to є M to
xdFn(x) = J xdFn(x) + J xdFn(x) + J xdFn(x)
0 0 є M
< є + M ■ P(Xn > є) + sup / xdFn(x). (6.5)
For fixed M the second term on the right-hand side of (6.5) converges to zero. Moreover, by uniform integrability we can choose M so large that the third term is smaller than є. Hence, 0 < limsupn^TOE(Xn) < 2є for all є > 0, and thus limn^TO E (Xn) = 0. Q. E.D.
Also Theorems 6.4 and 6.5 carry over to random vectors by replacing the absolute value function |x | by the Euclidean norm ||x || = VxTx.
In most (but not all!) cases in which convergence in probability and the weak law of large numbers apply, we actually have a much stronger result:
Definition 6.3: We say that Xn converges almost surely (or with probability 1) to X, also denoted by Xn ^ X a. s. (or w. p. 1), if
The equivalence of conditions (6.6) and (6.7) will be proved in Appendix 6.B (Theorem 6.B.1).
It follows straightforwardly from (6.6) that almost-sure convergence implies convergence in probability. The converse, however, is not true. It is possible that a sequence Xn converges in probability but not almost surely. For example, let Xn = Un/n, where the Un’s are i. i.d. nonnegative random variables with distribution function G(u) = exp(-1 /u) for u > 0, G(u) = 0 for u < 0. Then, for arbitrary є > 0,
P(|Xn | < є) = P(Un < пє) = G(nє)
= exp(-1/^)) ^ 1 as n ^to;
hence, Xn ^p 0. On the other hand,
P(|Xm | < є for all m > n) = P(Um < mє for all m > n)
where the second equality follows from the independence of the Un’s and the last equality follows from the fact thatJ^“=j m-1 = to. Consequently, Xn does not converge to 0 almost surely.
Theorems 6.2-6.5 carry over to the almost-sure convergence case without additional conditions:
Theorem 6.6: (Kolmogorov S strong law of large numbers). Under the conditions of Theorem 6.2, ft ^ л a. s.
Proof: See Appendix 6.B.
The result of Theorem 6.6 is actually what you see happening in the right – hand panels of Figures 6.1-6.3.
Theorem 6.7: (Slutsky’s theorem). Let Xn a sequence of random vectors in Kk converging a. s. to a (random or constant) vector X. Let Ф(х) be an Km – valuedfunction on Kk that is continuous on an open subset B of Rkfor which P(X є B) = 1). Then Ф(Xn) ^ f (X) a. s.
Recall that open subsets of a Euclidean space are Borel sets.
Proof: See Appendix 6.B.
Because a. s. convergence implies convergence in probability, it is trivial that
Theorem 6.8: IfXn ^ X a. s., then the result of Theorem 6.4 carries over. Theorem 6.9: IfXn ^ X a. s., then the result of Theorem 6.5 carries over.