Convergence in Distribution

Let Xn be a sequence of random variables (or vectors) with distribution functions Fn (x), and let X be a random variable (or conformable random vector) with distribution function F(x).

Definition 6.6: We say that Xn converges to X in distribution (denoted by Xn ^d X) if limn^XlFn(x) = F(x) pointwise in x – possibly except in the dis­continuity points of F(x).

Alternative notation: If X has a particular distribution, for example N(0, 1), then Xn ^d X is also denoted by Xn ^d N(0, 1).

The reason for excluding discontinuity points of F(x) in the definition of convergence in distribution is that limn^TO Fn(x) may not be right-continuous in these discontinuity points. For example, let Xn = X + 1/n. Then Fn(x) = F(x – 1/n). Now if F(x) is discontinuous in x0, then limn^TO F(x0 – 1/n) < F(x0); hence limn^TO Fn (x0) < F(x0). Thus, without the exclusion of discon­tinuity points, X + 1/n would not converge in distribution to the distribution of X, which would be counterintuitive.     If each of the components of a sequence of random vectors converges in dis­tribution, then the random vectors themselves may not converge in distribution. As a counterexample, let

Then X1n —dN(0, 1)and X2n —dN(0, 1), but Xn does not converge in distri­bution.

Moreover, in general Xn —dX does not imply that Xn — p. For example, if we replace X by an independent random drawing Z from the distribution of X, then Xn —dX and Xn —dZ are equivalent statements because they only say that the distribution function of Xn converges to the distribution function of X (or Z) pointwise in the continuity points of the latter distribution function. If Xn ——dX implied Xn —pX, then Xn —pZ would imply that X = Z, which is not possible because X and Z are independent. The only exception is the case in which the distribution of X is degenerated: P (X = c) = 1 for some cons­tant c:

Theorem 6.16: IfXn converges in distribution to X, and P (X = c) = 1, where c is a constant, then Xn converges in probability to c.

Proof: Exercise.

Note thatthis result is demonstrated in the left-hand panels of Figures 6.1-6.3. On the other hand,

Theorem 6.17: Xn — p X implies Xn —d X.

Proof: Theorem 6.17 follows straightforwardly from Theorem 6.3, Theorem 6.4, and Theorem 6.18 below. Q. E.D.

There is a one-to-one correspondence between convergence in distribution and convergence of expectations of bounded continuous functions of random variables:

Theorem 6.18: Let Xn and X be random vectors in R*. Then Xn —d X if and only iffor all bounded continuous functions p on R* limn—m E [p(Xn)] = E [p (X)].

Proof: I will only prove this theorem for the case in which Xn and X are ran­dom variables. Throughout the proof the distribution function of Xn is denoted by Fn(x) and the distribution function of X by F(x).

Proof of the “only if” case: Let Xn —dX. Without loss of generality we

may assume that p(x) є [0, 1] for all x. For any є > 0 we can choose continuity points a and b of F(x) such that F(b) – F(a) > 1 – є. Moreover, we can
choose continuity points a = c < c2 < ■■■ < cm = b of F(x) such that, for j = 1m — 1,

sup p(x) — inf p(x) < є. (6.17)

X G(Cj ,Cj+l] xG(cj, cj+1 ]

Now define

f (x) = inf p(x) for x Є (Cj, Cj + 1], xe(Cj. Cj+1]

Moreover,

b

E[<p(X)] = j q>(x)dF(x) = F(a) + j ^—— dF(x) < F(b). (6.25)

a

Combining (6.24) and (6.25) yields F(b) > limsupn^TOFn(a); hence, because b(> a) was arbitrary, letting b I a it follows that

F(a) > limsup Fn(a). (6.26)

n^TO

Similarly, for c < a we have F (c) < liminfn^TO Fn (a); hence, if we let c a, it follows that

F(a) < liminf Fn(a). (6.27)

n^TO

If we combine (6.26) and (6.27), the “if” part follows, that is, F(a) = limn^TOFn (a). Q. E.D.

Note that the “only if’ part of Theorem 6.18 implies another version of the bounded convergence theorem:

Theorem 6.19: (Bounded convergence theorem) IfXn is bounded: P(|Xn | < M) = 1 for some M < to and all n, then Xn ^d X implies limn^TOE(Xn) = E(X).

Proof: Easy exercise.

On the basis of Theorem 6.18, it is not hard to verify that the following result holds.

Theorem 6.20: (Continuous mapping theorem) Let Xn andXbe random vec­tors in R such that Xn ^d X, and let ) be a continuous mapping from R into Km. Then Ф(Xn)^dФ(X).

Proof: Exercise.

The following are examples of Theorem 6.20 applications:

(1) Let Xn ^d X, where Xis N(0, 1) distributed. Then X2 ^d x.

(2) Let Xn ^d X, where X is Nk(0, I) distributed. Then XjXn ^d x%.

If Xn ^d X, Yn Y, and Ф(x, y) is a continuous function, then in general

it does not follow that Ф^п, Yn) ^d Ф(X, Y) except if either X or Y has a degenerated distribution:

Theorem 6.21: LetXand Xn be random vectors in Kk such that Xn ^dX, and let Yn be a random vector in Km such that plimn^TOYn = c, where c є Rm is

a nonrandom vector Moreover, let Ф^, y) be a continuous function on the set Kkx{y є Km : ||y – c|| < 5} for some 8 > 0.6 Then Ф(Хп, Yn) ^d Ф(Х, c).

Proof: Again, we prove the theorem for the case k = m = 1 only. Let Fn (x) and F(x) be the distribution functions of Xn and X, respectively, and let Ф^, y) be a bounded continuous function on К x (c – 8, c + 8) for some 8 > 0. Without loss of generality we may assume that |Ф^, y)| < 1. Next, let є > 0 be arbitrary, and choose continuity points a < b of F(x) such that F(b) — F(a) > 1 — є. Then for any y > 0,

|E[Ф^, Yn)] — E[Ф^, c)|

< E[^(Xn, Yn) — Ф№, c)|/(| Yn — c|< у)]

+ E[|Ф№, Yn) — Ф№, c)|/(|Yn — c| > у)]

< E [^Xn, Yn) — Ф№, c)|/(| Yn — c|< y ) I (Xn є [a, b])]

+ 2P(Xn Є [a, b]) + 2P(| Yn — c| > y)

< sup |Ф(x, y) — Ф^, c)| + 2(1 — Fn (b) + Fn (a))

xe[a, b], |y—c|<y

+ 2P(| Yn — c| >y)■ (6.28)

Because a continuous function on a closed and bounded subset of Euclidean space is uniformly continuous on that subset (see Appendix II), we can choose Y so small that

sup |Ф^, y) — Ф^, c)| < є^ (6.29)

x e[a, b], |y—c|<Y

Moreover, 1 — Fn(b) + Fn(a) ^ 1 — F(b) + F(a) < є, and P(| Yn — c| > Y) ^ 0. Therefore, it follows from (6.28) that

limsup |E^(Xn, Yn)] — E^(Xn, c)| < 3є^ (6.30)

The rest of the proof is left as an exercise. Q. E.D.

Coronary 6.1: Let Zn be t-distributed with n degrees of freedom. Then Zn ^d N(0, 1).

Proof: By the definition of the t-distribution with n degrees of freedom we can write

 Z U0 (6.31) n= sn=1U2 where U0, U,■■■ , Un are i. i.d . N(0, 1). Let Xn = U0 and X = U0 so that trivially Xn ^d X. Let Yn = (1/n)Z)n=i Uj. Then by the weak law
 6 Thus, Ф is continuous in y on a little neighborhood of c.

of large numbers (Theorem 6.2) we have plimn^TOYn = E(Uj2) = 1. Let ФД, y) = x Д/y. Note that Ф(х, y) is continuous on R x (1 – є, 1 + e)for0 < є < 1. Thus, by Theorem 6.21, Zn = Ф(Хп, Yn) ^ Ф(Х, 1) = U0 ~ N(0, 1) in distribution. Q. E.D.

Coronary 6.2: Let U1 …Un be a random sample from Nk (д, E), where E is nonsingular. Denote U = (1/n)Y^j=1 Uj, E = (1/(n — 1))YTj=j(Uj — U)(Uj — U)T, and let Zn = n(U — д)тЕ—1(t7 — д). Then Zn ^d xk.

Proof: For a k x k matrix A = (a1ak), let vec(A) be the k2 x 1 vec­tor of stacked columns aj, j = 1,…,k of A : vec(A) = (a’T,.aJ)T = b, for instance, with inverse vec—1(b) = A. Let c = vec(E), Yn = vec(E), Xn = *fn(U — д), X ~ Nk(0, E), and Ф(x, y) = xT(vec—1(y))—1 x. Because E is nonsingular, there exists a neighborhood C(8) = {y e Kkxk : ||y — c\ < 8} of

c such that for all y in C(8), vec—1 (y) is nonsingular (Exercise: Why?), and con­sequently, Ф(x, y) is continuous onKk x C(8) (Exercise: Why?). The corollary follows now from Theorem 6.21 (Exercise: Why?). Q. E.D.