Central Limit Theorems
Let Zt, t Є N, be a sequence of iid random variables with EZ t = p and var(Z t) = a2, 0 < a2 < ro. Let Zn = n – X П= 1Zt denote the sample mean. By Kolmogorov’s strong LLN for iid random variables (Theorem 18) it then follows that Zn – EZn converges to zero a. s. and hence i. p. This implies that the limiting distribution of Zn – EZn is degenerate at zero, and thus no insight is gained from this limiting distribution regarding the shape of the distribution of the sample mean for finite n; compare the discussion at the beginning of Section 2.2. Suppose we consider the rescaled quantity
4П (Zn – EZn) = n1/2 X (Zt – fi). (10.10)
Then the variance of the rescaled expression is a2 > 0 for all n, indicating that its limiting distribution will not be degenerate. Theorems that provide results concerning the limiting distribution of expressions like (10.10) are called central limit theorems (CLTs). Rather than to center the respective random variables, as is done in (10.10), we assume in the following, without loss of generality, that the respective random variables have mean zero.
Some classical CLTs
In this subsection we will present several classical CLTs, starting with the Lindeberg-Levy CLT.
Theorem 24.20 (Lindeberg-Levy CLT) Let Zt be a sequence of iid random variables with EZt = 0 and var(Zt) = a2 < «>. Then n~1/2 Xn= 1Zt — N(0, a2). (In case a2 = 0 the limit N(0, 0) should be interpreted as the degenerate distribution having all its probability mass concentrated at zero.21)
Of course, if a2 > 0 the conclusion of the theorem can be written equivalently as n~1/2 X n=1Zt/a N(0, 1). Extensions of Theorem 24 and of any of the following central limit theorems to the vector case are readily obtained using the Cramer – Wold device (Theorem 13). To illustrate this we exemplarily extend Theorem 24 to the vector case.
Example 8. Let Zt be a sequence of iid ^-dimensional random vectors with zero mean and finite variance covariance matrix X. Let En = n1/2Xnt= iZ t, let E ~ N(0, X) (where N(0, X) denotes a singular normal distribution if X is singular), and let a be some element of R*. Now consider the scalar random variables a’En = n~1/2 XП=і a’Zt. Clearly the summands a’Zt are iid with mean zero and variance a’Xa. It hence follows from Theorem 24 that a’En converges in distribution to N(0, a’Xa). Of course a’E ~ N(0, a’Xa), and hence a’En — a’E. Since a was arbitrary it follows from Theorem 13 that En — E, which shows that the random vector n~1/2 X n= 1Zt converges in distribution to N(0, X).
Theorem 24 postulates that the random variables Zt are iid. The following theorems relax this assumption to independence. It proves helpful to define
o2n) = (10.11)
where <32t = var(Zt). For independent Zts clearly o2n) = n2var(Zn), and in case the Zt s are iid with variance o2 we have o2n) = no2. To connect Theorem 24 with the subsequent CLTs observe that within the context of Theorem 24 we have n1/2 Xt=1 Zt/o = Xn=1 Zt/O(n) (given o2 > 0).
Theorem 25.22 (Lindeberg-Feller CLT) Let Zt be a sequence of independent random variables with EZt = 0 and var(Zt) = o2 < ^. Suppose that o2n) > 0, except for finitely many n. If for every e > 0
lim — XE[ | Zt|2 1 (| Ztl > eO(n))] = 0, (L)
n—- ^2n) ti ()
then Xn= 1Zt/o(n) — N(0, 1).
Condition (L) is called the Lindeberg condition. The next theorem employs in place of the Lindeberg condition a condition that is stronger but easier to verify.
Theorem 26.23 (Lyapounov CLT) Let Zt be a sequence of independent random variables with EZt = 0 and var(Zt) = o2 < «>. Suppose that o2n) > 0, except for finitely many n. If for some 5 > 0
limX E|Zt /0(n)|2+8 = 0, (P)
then Xn= 1 Zt/o(n) — N(0, 1).
Condition (P) is called the Lyapounov condition. Condition (P) implies condition (L). It is readily seen that a sufficient condition for (P) is that
We note that the conclusions of Theorems 25 and 26 can be stated equivalently as n1/2 X n=1Zt — N(0, у), whenever (10.12) holds. In this context we also make the trivial observation, that for a sequence of independent random variables Zt with zero mean and finite variances a2t > 0 the condition n4o2n) ^ у = 0 implies n1/2 X n=1Zt — 0 (Corollary 1), which can also be rewritten as n-1/2 X n=1Zt — N(0, у), у = 0.
The above CLTs were given for sequences of random variables (Zt, t > 1). They can be readily generalized to cover triangular arrays of random variables (Ztn,
1 < t < n, n > 1). In fact Theorems 25 and 26 hold with Zt replaced by Zn and
2 replaced by о2n; see, e. g., Billingsley (1979, pp. 310-12).
The need for CLTs for triangular arrays arises frequently in econometrics. One example is the derivation of the limiting distribution of the least squares estimator when different regressors grow at different rates. In this case one can still obtain a limiting normal distribution for the least squares estimator if the usual л/n – norming is replaced with a normalization by an appropriate diagonal matrix. In essence, this entails renormalizing the z’th regressor by the square root of XП=іx2ti, whose obvious dependence on n leads to the consideration of a CLT for quantities of the form X n=1 ctnut with ut iid; see Theorem 28 below.
CLTs for regression analysis
In this subsection we present some CLTs that are geared towards regression analysis. As discussed above, within this context we will often need CLTs for a sequence of iid random variables multiplied by some time-varying scale factors, that may also depend on the sample size. We first give a general CLT that covers such situations as a corollary to the Lindeberg-Feller CLT.
Theorem 27.24 Let Zt be a sequence of iid random variables with EZt = 0 and var(Zt) = 1. Furthermore, let (otn, 1 < t < n, n > 1) be a triangular array of real
numbers, and define the triangular array Zn by Ztn = anZt. Suppose that afn) = ХП=і ^(П > 0, except for finitely many n. If
limmaxn^ = o, (M)
— I n=1 Gtn У )
then In= iZtn/0(n) 4 N(0, 1).
All of the subsequent CLTs in this section are based on Theorem 27. Explicit proofs are given in a longer mimeographed version of this article, which is available from the authors upon request.
Theorem 28. Let ut, t > 1, be a sequence of iid random variables with Eut = 0 and Eu( = a2 < «>. Let Xn, n > 1, with Xn = (xt,) be a sequence of real non-stochastic n x k matrices with
lim maXlst-nX = 0 for i = 1, …, k, (10.14)
n4” I n=1 x(
where it is assumed that In=1 x2u > 0 for all but finitely many n. Define Wn = XnSf where Sn is a k x k diagonal matrix with the ith diagonal element equal to In=1 Хк]1/2, and assume that limn^,fW’nWn = Ф is finite. Let un = [u1,…, un]’, then Wnun -4 N(0, а2Ф).
The above theorem is given in Amemiya (1985, p. 97), for the case of nonsingular а2Ф.25 The theorem allows for trending (nonstochastic) regressors. For example, (10.14) holds for xti = tp, p > 0. We note that in case of a single regressor WnWn = Ф = 1.
Theorem 29. Let u t, t > 1, be a sequence of iid random variables with Eut = 0 and Eu2 = a2 < ^. Let Xn, n > 1, with Xn = (xti) be a sequence of real nonstochastic n x k matrices with limn^J, n^1X’nXn = Q finite. Let un = [u 1,…, un]’, then n-1/2Xnu n -4 N(0, a2Q).
The theorem is, for example, given in Theil (1971, p. 380), for the case of nonsingular a2Q. The theorem does not require that the elements of Xn are bounded in absolute value, as is often assumed in the literature.
We now use Theorems 28 and 29 to exemplarily give two asymptotic normality results for the least squares estimator.
Example 9. (Asymptotic normality of the least squares estimator) Consider the linear regression model
Vt = I xti Pi + ut, t > 1.
Suppose ut and Xn = (xti) satisfy the assumption of Theorem 28. Assume furthermore that the matrix Ф in Theorem 28 is nonsingular. Then rank (Xn) = k for large n and the least squares estimator for в = (pv…, pk)’ is then given by Sn = (ХПХп)-1ХПуп with yn = (j/v…, yn)’. Since Pn – в = (XnXn)-1Xnu we have
Sn(Pn – в) = Sn(XnXn)-1Snsn-1Xnu n = (W’nWn)-1W’u ri
where Sn is defined in Theorem 28. Since limn^„ WWn = Ф and Ф is assumed to be nonsingular, we obtain
Sn(Pn – в) – N(0, о2Ф-1)
as a consequence of Theorem 28. Note that this asymptotic normality result allows for trending regressors.
Now suppose that ut and Xn = (xti) satisfy the assumptions of Theorem 29 and that furthermore Q is nonsingular. Then we obtain by similar argumentation
vn (Pn – в) = (nX Xn)-1( n 2 Xn un)
We note that Theorem 29 does not hold in general if the regressors are allowed to be triangular arrays, i. e. the elements are allowed to depend on n. For example, suppose k = 1 and Xn = [x11n,…, xn1n]’ where
I 0 t < n
xt 1,n = 1
[ n t = n,
then п^ХП Xn = 1 and п~1/2ХП u n = un. The limiting distribution of this expression is just the distribution of the uts, and hence not necessarily normal, violating the conclusion of Theorem 29.
We now give a CLT where the elements of Xn are allowed to be triangular arrays, but where we assume additionally that the elements of the Xn matrices are bounded in absolute value.
Theorem 30. Let ut, t > 1, be a sequence of iid random variables with Eut = 0 and Eu2 = о2 < «>. Let (xt! n, 1 < t < n, n > 1), i = 1,…, k, be triangular arrays of real numbers that are bounded in absolute value, i. e. supn sup1<t<n1<i<k | xtin | < «>. Let Xn = (xti n) denote corresponding sequences of n x k real matrices and let lim п^ХП Xn = Q be finite. Furthermore, let un = [u1,…, un]’, then n~1/2X’nun — N(0, о 2Q).
Inspection of the proof of Theorem 30 shows that the uniform boundedness condition is stronger than is necessary and that it can be replaced by the condition max |xtin| = o (n1/2) for i = 1,…, k.
1<f < n