# Convergence in distribution

Let 0n be an estimator for a real-valued parameter 0 and assume 0n -— 0. If Gn denotes the cumulative distribution function (CDF) of 0n, i. e., Gn(z) = P(0n < z), then as n ^

To see this observe that P(0n < z) = P(0n – 0 < z – 0) < P(| 0n – 0 | > 0 – z) for z < 0, and P(0n < z) = 1 – P(0n > z) = 1 – P(0n – 0 > z – 0) > 1 – P(|0n – 0| > z – 0) for z > 0. The result in (10.4) shows that the distribution of 0n "collapses" into the degenerate distribution at 0, i. e., into

0 for z < 0

1 for z > 0.

Consequently, knowing that 0n — 0 does not provide information about the shape of Gn. As a point of observation note that Gn(z) ^ G(z) for z Ф 0, but Gn(z) may not converge to G(z) = 1 for z = 0. For example, if 0n is distributed symmetrically around 0, then Gn(0) = 1/2 and hence does not converge to G(0) = 1.

This raises the question of how we can obtain information about Gn based on some limiting process. Consider, for example, the case where 0n is the sample mean of iid random variables with mean 0 and variance о2 > 0. Then 0n — 0 in light of Corollary 1, since E0n = 0 and var(0n) = о2/n ^ 0. Consequently, as discussed above, the distribution of 0n "collapses" into the degenerate distribution at 0. Observe, however, that the rescaled variable ->Jn (0n – 0) has mean zero and variance о2. This indicates that the distribution of л[П (0n – 0) will not collapse to a degenerate distribution. Hence, if sfn (0n – 0) "converges," the limiting CDF can be expected to be non-degenerate. To formalize these ideas we need to define an appropriate notion of convergence of CDFs.6

Definition 4. (Convergence in distribution) Let F1, F2,…, and F denote CDFs on R. Then Fn converges weakly to F if

lim Fn(z) = F(z)

for all z О R that are continuity points of F.

Let Zv Z2,…, and Z denote random variables with corresponding CDFs F1, Fb, and F, respectively. We then say that Zn converges in distribution (or in

law) to Z, if Fn converges weakly to F. We write Zn Z or Zn -— Z.

Consider again the sample mean 0n of iid random variables with mean 0 and variance o2 > 0. As demonstrated above, 0n -— 0 only implies weak convergence of the CDF of 0n to a degenerate distribution, which is not informative about the shape of the distribution function of 0n. In contrast the limiting distribution of л/n (0n – 0) is found to be non-degenerate. In fact, using Theorem 24 below, it can be shown that ^[4 (0n – 0) converges in distribution to a N(0, o2) distributed random variable. As a result we can take N(0, o2) as an approximation for the finite sample distribution of – Jn (0n – 0), and consequently take N(0, o 2/n) as an approximation for the finite sample distribution of 0n.

Remark 2

(a) The reason for requiring in the above definition that Fn(z) ^ F(z) converges only at the continuity points of F is to accommodate situations as, e. g., in (10.4). Of course, if F is continuous, then Fn converges weakly to F if and only if Fn(z) ^ F(z) for all z Є R.

(b) As is evident from the definition, the concept of convergence in distribution is defined completely in terms of the convergence of distribution functions. In fact, the concept of convergence in distribution remains well defined even for sequences of random variables that are not defined on a common probability space.

(c) To further illustrate what convergence in distribution does not mean consider the following example: Let Y be a random variable that takes the values +1 and -1 with probability 1/2. Define Zn = Y for n > 1 and Z = – Y. Then clearly Zn — Z since Zn and Z have the same distribution, but |Z n – Z| = 2 for all n > 1. That is, convergence in distribution does not necessarily mean that the difference between random variables vanishes in the limit. More generally, if Zn — Z and one replaces the sequence Zn by a sequence Z * that has the same marginal distributions, then also Zn — Z.

The next theorem provides several equivalent characterizations of weak convergence.

Theorem 8.7 Consider the cumulative distribution functions F, F1, F2,…. Let Q, Q1, Q2,… denote the corresponding probability measures on R, and let ф, ф1, ф2,… denote the corresponding characteristic functions. Then the following statements are equivalent:

(a) Fn converges weakly to F.

(b) limn^„ Qn(A) = Q(A) for all Borel sets A C R that are Q-continuous, i. e. for all Borel sets A whose boundary ЭА satisfies Q(dA) = 0.

(c) limn^„ /fdFn = /fdF for all bounded and continuous real valued functions f on R.

(d) limn^„ Фп(і) = ф(і) for all t Є R.

If, furthermore, the cumulative distribution functions F, F1, F2,… have moment generating functions M, M1, M2,… in some common interval [-t*, f*j, t* > 0, then

(a) , (b), (c) or (d) are, respectively, equivalent to

(e) limn^„ Mn(t) = M(t) for all t Є [-t*, t*j.

Remark 3. The equivalence of (a) and (b) of Theorem 8 can be reformulated as Zn — Z « P( Zn Є A) ^ P(Z Є A) for all Borel sets A with P(Z Є ЭЛ) = 0. The equivalence of (a) and (c) can be expressed equivalently as Zn — Z « Ef (Zn) ^ Ef(Z) for all bounded and continuous real valued functions f on R.

The following theorem relates convergence in probability to convergence in distribution.

Theorem 9. Zn — Z implies Zn — Z.

Zn — Z is replaced by Zn —> Z or Zn Z, since the latter imply the former.)

Proof. Letf(z) be any bounded and continuous real valued function, and let C denote the bound. Then Zn — Z implies f(Zn) — f(Z) by the results on convergence in probability of transformed sequences given in Theorem 14 in Section 2.3. Since |f(Zn(ю)| < C for all n and ю Є Q it then follows from Theorems 6 and 4 that Ef(Zn) ^ Ef(Z), and hence Zn — Z by Theorem 8. ■

The converse of the above theorem does not hold in general, i. e. Zn — Z does not imply Zn — Z. To see this consider the following example: let Z ~ N(0, 1) and put Zn = (-1)nZ. Then Zn does not converge almost surely or in probability. But since each Zn ~ N(0, 1), evidently Zn — Z.

Convergence in distribution to a constant is, however, equivalent to convergence in probability to that constant.

Theorem 10. Let c Є R, then Zn c is equivalent to Zn c.

Proof. Because of Theorem 9 we only have to show that Zn c implies Zn c. Observe that for any e > 0

P(|Zn – c | > e) = P(Zn – c < – e) + P(Zn – c > e)

< P(Zn < c – e) – P(Zn < c + e) + 1 = Fn(c – e) – Fn(c + e) + 1

where Fn is the CDF of Zn. The CDF of Z = c is

0 z < c

1 z > c

Hence, c – e and c + e are continuity points of F. Since Zn — Z it follows that Fn(c – e) ^ F(c – e) = 0 and Fn(c + e) ^ F(c + e) = 1. Consequently,

0 < P(|Zn – c | > e) < Fn(c – e) + 1 – Fn(c + e) ^ 0 + 1 – 1 = 0.

This shows Zn — Z = c. ■

In general convergence in distribution does not imply convergence of moments; in fact the moments may not even exist. However, we have the following result.

Theorem 11.8 Suppose Zn — Z and suppose that supn E |Zn |r < ^ for some 0 < r < ro. Then for all 0 < s < r we have E |Z|s < ^ and limn^„ E |Zn |s = E |Z|s. If, furthermore, Zs and Zn are well-defined for all n, then also lim n^„ EZsn = EZs.

Remark 4. Since Z n — Z and Z n — Z imply Z n — Z, Theorem 11 provides sufficient conditions under which Z n — Z and Z n — Z imply convergence of moments. These conditions are an alternative to those of Theorems 6 and 4.

The concept of convergence in distribution can be generalized to sequences of random vectors Zn taking their values in Rk. Contrary to the approach taken in generalizing the notions of convergence in probability, almost surely, and in rth mean to the vector case, the appropriate generalization is here not obtained by simply requiring that the component sequences Zn) converge in distribution for i = 1,…, k. Such an attempt at generalizing the notion of convergence in distribution would yield a nonsensical convergence concept as is illustrated by Example 4 below. The proper generalization is given in the following definition.

Definition 5. Let F1, F2,…, and F denote CDFs on Rk. Then Fn converges weakly to F if

lim Fn(z) = F(z)

for all z Є Rk that are continuity points of F.

Let Z1, Z2, . . . , and Z denote random vectors taking their values in Rk with corresponding CDFs F1, F2,…, and F, respectively. We then say that Zn converges in distribution (or in law) to Z, if Fn converges weakly to F. We write

Zn — Z or Zn — Z.

All the results presented in this subsection so far also hold for the multivariate case (with Rk replacing R). Convergence in distribution of a sequence of random matrices Wn is defined as convergence in distribution of vec(Wn).

The next theorem states that weak convergence of the joint distributions implies weak convergence of the marginal distributions.

Theorem 12. Weak convergence of Fn to F implies weak convergence of F(ni) to F(i) and Zn — Z implies Z® — Z(i), where F® and F(i) denote the ith marginal

distribution of Fn and F, and Z® and Z® denote the ith component of Zn and Z, respectively.

Proof. The result follows from Theorem 14 below, since projections are continuous. ■

However, as alluded to in the above discussion, the converse of Theorem 12 is not true. That is, weak convergence of the marginal distributions is not equivalent to weak convergence of the joint distribution, as is illustrated by the following counter example.

Example 4. Let Z ~ N(0, 1) and let

Clearly, the marginal distributions of each component of Zn converge weakly to N(0, 1). However, for n even the distribution of Zn is concentrated on the line {(z, z) : z Є R}, whereas for n odd the distribution of Z n is concentrated on the line {(z, – z) : z Є R}. Consequently, the random vectors Z n do not converge in distribution, i. e. the distributions of Z n do not converge weakly.

The following result is frequently useful in reducing questions about convergence in distribution of random vectors to corresponding questions about convergence in distribution of random variables.

Theorem 13. (Cramer-Wold device) Let Z1, Z2,…, and Z denote random vectors taking their values in R*. Then the following statements are equivalent:

(a) Z n 1 Z

(b) a’Z n -1 a’Z for all а Є R*.

(c) a’Zn 1 a’Z for all а Є R* with || a || = 1.

Proof. The equivalence of ( b) and (c) is obvious. We now prove the equivalence of (a) with (c). Let tyn(t) and ф(і) denote, respectively, the characteristic functions of Z n and Z. According to the multivariate version of Theorem 8 we have Z n -1 Z if and only if §n(t) ^ ф(і) for all t = (t1,…, t*)’ Є R*. Let фa(s) and фа^)

denote the characteristic functions of a’Zn and a’Z, respectively. Again, a’Zn 1 a’Z if and only if ф*(s) ^ фа^) for all s Є R. Observe that for t Ф 0 we have

Фп^ ) = E(exp(it’Zn)) = E(exp(isa’Zn)) = фa(s)

with a = t/ lit || and s = || 11|. Note that || a || = 1. Similarly, ф(^ = фа^). Consequently, фn(t) ^ ф(^ for all t Ф 0 if and only if ф*(s) ^ фа(s) for all s Ф 0 and all а with || a || = 1. Since Фп(0) = ф(0) = 1 and ф*(0) = фа(0) = 1, the proof is complete observing that t = 0 if and only if s = 0. ■

## Leave a reply