# Consistency and Asymptotic Normality of Least Squares Estimator

The main purpose of this section is to prove the consistency and the asympto­tic normality of the least squares estimators of P and cr2 in Model 1 (classical linear regression model) of Chapter 1. The large sample results of the preced­ing sections will be extensively utilized. At the end of the section we shall give additional examples of the application of the large sample theorems.

Theorem 3.5.1. In Model 1 the least squares estimator is a consistent esti­mator of p if ЛДХ’Х) —*oo, where АДХ’Х) denotes the smallest characteristic root of X’X.13

Proof. The following four statements are equivalent:

(i) A,(X’X) —* oo.

(ii) A/[(X’X)-1] —* 0, where А/ refers to the largest characteristic root.

(iii) titX’X)-1-*!).

(iv) Every diagonal element of (X’X)-1 converges to 0.

Statement (iv) implies the consistency of P by Theorem 3.2.1.

The reader should verify that it is not sufficient for consistency to assume that all the diagonal elements of X’X go to infinity.

Theorem 3.5.2. If we assume that {ut} are i. i.d. in Model 1, the least squares estimator a2 defined in (1.2.5) is a consistent estimator of a2.

Proof. We can write

a2 = 7^‘u’u – r-VPu, (3.5.1)

where P = X(X’X)-1X’. By Theorem 3.3.2 (Kolmogorov LLN 2),

T-Wn^c2. (3.5.2)

Because u’Pu is nonnegative, we can use the generalized Chebyshev inequality

(3.2.5) to obtain

P{T~ ‘u’Pu > €) ё €~ ‘fT’u’Pu = o2e~1 T’lK. (3.5.3)

Therefore

T^‘u’Pu-^0. (3.5.4)

Therefore the theorem follows from (3.5.1), (3.5.2), and (3.5.4) by using Theorem 3.2.6.

We shall prove the asymptotic normality of the least square estimator in two steps: first, for the case of one regressor, and, second, for the case of many regressors.

Theorem 3.5.3. Assume that {u,) are i. i.d. in Model 1. Assume that K= 1, and because X is a vector in this case, write it as x. If

lim (x’x)"1 max x2 = 0, (3.5.5)

then <Г‘(x’x)1’2^ – Д) — N(0, 1).

Proof. We prove the theorem using the Lindeberg-Feller CLT (Theorem 3.3.6). Take x, ut as the X, of Theorem 3.3.6. Then fi, — 0 and a) = o2xj. Let Fx and Fz be the distribution functions of X and Z, respectively, where X7 = Z. Then

for any c. (Ibis can be verified from the definition of the Riemann-Stieltjes integral given in Section 3.1.2.) Therefore we need to prove

lim – Д1— У f a dF,(a) = 0 r— <xVx ft Je>eWx

for any £, where F, is the distribution function of (xtut)2. Let G be the distribu­tion function of uf. Then, because F,{a) = P(xju} <a) = P{u2 < ct/x}) = G(a/xj), we have

I a dF,(a) = f a dG(a/xj)

Ja>*Wx Ja>eW»

= Г x2kdG{k).

Therefore (3.5.6) follows from the inequality

I kdG(k)£ f A dG(k) (3.5.7)

J A>t2a2xr2x’x JA>e2(r4in»xxf)-,x’x

t

and the assumption (3.5.5).

Theorem 3.5.4. Assume that {u,} are i. i.d. in Model 1. Assume also

lim (x,’x,)-1 max xl = 0 (3.5.8)

r-*« ist*T

for every /=1,2,. . . , K. Define Z = XS-1, where S is the К X К diago­nal matrix the /th diagonal element of which is (x, x,)l/2, and assume that Іітт-… Z’Z = R exists and is nonsingular. Then

S(fi-p)-^N{0, оЧГ1).

Proof. We have S(0 — 0) = (Z’Z)~lZ’u. The limit distribution of c'(Z’Z)~’Z’u for any constant vector c is the same as that of y’Z’u where Y = c’R-1. But, because of Theorem 3.5.3, the asymptotic normality of y’Z’u holds if

(3.5.9)

where yk is the fcth element of у and ztk is the t, ktb element of Z. But, because y’Z’Zy S A,(Z’Z)y’y by Theorem 10 of Appendix 1, we have

(y’Z’Zy)”1 max ^ Укг, к) ^ (y’Z’Zy)-ly’y max £ A (3.5.10)

A,(Z’Z)

where the first inequality above follows from the Cauchy-Schwarz inequality. Therefore (3.5.9) follows from (3.5.10) because the last term of (3.5.10) con­verges to 0 by our assumptions. Therefore, by Theorem 3.5.3,

y’Z’u

er(y’Z’Zy)1/J

for any constant vector у Ф 0. Since y’Z’Zy —* c’R_1c, we have y’Z’u —*N(0, <72c’R-1c).

Thus the theorem follows from Theorem 3.3.8.

At this point it seems appropriate to consider the significance of assumption

(3.5.5) . Note that (3.5.5) implies x’x —»It would be instructive to try to construct a sequence for which x’x —» °° and yet (3.5.5) does not hold. The following theorem shows, among other things, that (3.5.5) is less restrictive than the commonly used assumption that lim Г_1х’х exists and is a nonzero constant. It follows that if lim T~ ‘X’X exists and is nonsingular, the condition of Theorem 3.5.4. is satisfied.

Theorem 3.5.5. Given a sequence of constants {x,}, consider the state­ments:

(i) limr_,» T~lcT= a, where а Ф 0, a < °°, and cT = , x?.

(ii) limr_«. cT = ».

(iii) limr_,» Cr1*7’=s= 0-

(iv) lim^* ер maxlsrs7-x? = 0.

Then, (i) => [(ii) and (iii)] =» (iv).

Proof, (i) =* (ii) is obvious. We have

Tx j’ Ct __ Ст ry— 1

TT – 1) ~T ~T – 1

Therefore Ііт^» (T— l)_1xf = 0, which implies (i) => (iii). We shall now prove [(ii) and (iii)] => (iv). Given e > 0, there exists Г, such that for any

ТШ Tx

Ст’хіКє (3.5.12)

because of (iii). Given this Г,, there exists T2 > Tx such that for any T> T2 C71 max x2 < € (3.5.13)

isi<r,

because of (ii). But (3.5.12) implies that for any T> T2

Ст1х}<є for t= TlfTi + l,. . . , T. (3.5.14)

Finally, (3.5.13) and (3.5.14) imply (iv).

It should be easy to construct a sequence for which (3.5.5) is satisfied but (i) is not.

Next we shall prove the asymptotic normality of the least square estimator of the variance a1.

Theorem 3.5.6. Assume that {и,} are i. i.d. with a finite fourth moment Euf = mA in Model 1. Then ff(a2 — a2) -* N(0, mA — a4).

Proof. We can write

ff{d2 – a2) = U U – – L uTu. (3.5.15)

vT vT

The second term of the right-hand side of (3.5.15) converges to 0 in probability by the same reasoning as in the proof of Theorem 3.5.2, and the first term can be dealt with by application of the Lindeberg-Levy CLT (Theorem 3.3.4). Therefore the theorem follows by Theorem 3.2.7(i).

Let us look at a few more examples of the application of convergence theorems.

Example 3.5.1. Consider Model 1 where К = 1:

у = /?x + u, (3.5.16)

where we assume lim T~ ‘x’x = с Ф 0 and {ut} are i. i.d. Obtain the probability limit of Ди = y’y/x’y. (Note that this estimator is obtained by minimizing the sum of squares in the direction of the x-axis.)

We can write v’ll ll’ll

P2 + 2fi – y- + -7-

x’x XX x’u

p+™

x’x

We have Дх’и/х’х)2 = <t2(x’x)-1 —* 0 as Г->°°. Therefore, by Theorem 3.2.1,

plim^ = 0. (3.5.18)

x’x

Also we have

because of Theorem 3.2.6 and Theorem 3.3.2 (Kolmogorov LLN 2). There­fore, from (3.5.17), (3.5.18), and (3.5.19) and by using Theorem 3.2.6 again, we obtain

plim Pr=P + jc – (35.20)

Note that c may be allowed to be °°, in which case PR becomes a consistent estimator of p.

Example 3.5.2. Consider the same model as in Example 3.5.1 except that we now assume lim T~2x’x = <». Also assume limj–.» (x’x)-1 max1Sts7. X? = 0 so that P is asymptotically normal. (Give an examplepf a sequence satisfying these two conditions.) Show that p = x’y/x’x and PR — y’y/x’y have the same asymptotic distribution.

Clearly, plim P = plim pR = p. Therefore, by Theorem 3.2.2, both estima­tors have the same degenerate limit distribution. But the question concerns the asymptotic distribution; therefore we must obtain the limit distribution of each estimator after a suitable normalization. We can write

х’и

p+™

XX

But by our assumptions plim u’u(x’x) 1/2 = 0 and plim (x’u/x’x) = 0. There-

fore (х’х)1,2(Рл — p) and (x’x)l/2(0 — fi) have the same limit distribution by repeated applications of Theorem 3.2.7. >

Example 3.5.3. Consider Model 1 with К =2:

y = filx1+p2x2 + u,

where we assume that {u,} are i. i.d. Assume also lim T^X’X = A,_where A is a 2X2 nonsingular matrix. Obtain the asymptotic distribution of fix /P2, where fix and p2 are the least squares estimators of Д and 02, assuming02 Ф 0.

We can write

A Al_

A aJ L Да

Because plim p2 = JJ2, the right-hand side of (3.5.23) has the same limit distribution as

yff (Px-Px)- PxtfyfT (A-Pil

But, because our assumptions imply (3.5.8) by Theorem 3.5.5, [ л/Г( — /?,), JT(Рг ~ Рг) converges to a bivariate normal variable by Theorem 3.5.4. Therefore, by Theorem 3.2.5, we have

wherey’ = (P2 – Pxfc2).

Exercises

1. (Section 3.1.2)

Prove that the distribution function is continuous from the left.

2. (Section 3.2)

Prove Theorem 3.2.3. hint: Definition 3.1.2 (iii) implies that if £i„ C Qm for n < m and limn_00 = A, then limn_e P(Cln) = P(A).

3. (Section 3.2)

Prove Theorem 3.2.4.

4. (Section 3.3)

Let be _ as defined in Theorem 3.3.1. Prove

limn^E(Xn-EXnY = Q.

5. (Section 3.3)

Let {a,}, 1=1,2,… , be a nonnegative sequence such that <ХЇ- a,)/T < M for some Л/and every T. Prove limr_e2£.i (ajt2) < °°.

6. (Section 3.3)

Prove that the conditions of Theorem 3.3.6 (Lindeberg-Feller CLT) fol­low from the conditions of Theorem 3.3.4 (Lindeberg-Levy CLT) or of Theorem 3.3.5 (Liapounov CLT).

7. (Section 3.3)

Let {X,} be i. i.d. with EX, = fi. Then X„ ц. This is a corollary of Theorem 3.3.2 (Kolmogorov LLN 2) and is called Khinchine’s WLLN (weak law of large numbers). Prove this theorem using characteristic functions.

8. (Section 3.5)

Show that A,(X’X) —»00 implies xjx/ —*» for every i, where *, is the ith column vector of X. Show also that the converse does not hold.

9. (Section 3.5)

Assume К = 1 in Model 1 and write X as x. Assume that {и,} are indepen­dent. If there exist L and M such that 0 < L < x’x/T < M for all T, show

10. (Section 3.5)

Suppose у = y* – f u and x = x* + y, where each variable is a vector of T components. Assume y* and x* are nonstochastic and (u,, v,) is a bivar­iate i. i.d. random variable with mean 0 and constant variances о*, <7p, respectively, and covariance auv. Assume y* = px*, but y* and x* are not observable so that we must estimate P on the basis of у and x. Ob­tain the probability limit of P = x’y/x’x on the assumption that limr_,» T~lx*’x* = M.

11. (Section 3.5)

Consider the regression equation у = Х, Д, + X^ + и. Assume all the assumptions of Model 1 except that X = (X,, X2) may not be full rank. Let Z be the matrix consisting of a maximal linearly independent subset of the columns of X2 and assume that the smallest characteristic root of the matrix (X[, Z)’^, Z) goes to infinity as T goes to infinity. Derive a consistent estimator of px. Prove that it is consistent.

12. (Section 3.5)

Change the assumptions of Theorem 3.5.3 as follows: {ut} are indepen­dent with Eu, = 0, Vut = a2, and Еи,ъ = m3. Prove <7-1(x’x)1/2 (fi — fi)—* N(0, 1) using Theorem 3.3.5 (Liapounov CLT).

13. (Section 3.5)

Construct a sequence^,} such that 2^.! x -* «but the condition (3.5.5) is not satisfied.

14. (Section 3.5)

Construct a sequence {д:,} such that the condition (3.5.5) holds but the condition (i) of Theorem 3.5.5. does not.

15. (Section 3.5)

Let 1 be the vector of ones. Assuming lim T~ 4’x* = N Ф 0 in the model of Exercise 10, prove the consistency of Д = l’y/1’х and obtain its asymptotic distribution.

16. (Section 3.5)

Assume that {и,} are i. i.d. in Model 1. Assume К = 1 and write X as x. Obtain the asymptotic distribution of fi = l’y/1’х assuming limr_. Г-1(1’х)2 =00 where 1 is the vector of ones.

17. (Section 3.5)

Consider the classical regression model у = ax + fiz + u, where a and /? are scalar unknown parameters, x and z are Г-component vectors of known constants, and u is a Г-component vector of unobservable i. i.d. random variables with zero mean and unit variance. Suppose we are given an estimator fi that is independent of u and the limit distribution of T1/2(fi — fi) is N(0, 1). Define the estimator a by

> *'(y – fa)

x’x

Assuming lim T~lx’x = c and lim T~lx’z = d, obtain the asymptotic distribution of a. Assume сФ 0 and ЛФ 0.

18. (Section 3.5)

Consider the regression model у = fi(x + al) + u, where у, x, 1, and u are Г-vectors and a and fi are scalar unknown parameters. Assume that 1 is a T – vector of ones, 1ітг_„ x’l = 0, and limr_« Г-1х’х = c, where c is a nonzero constant. Also assume that the elements of u are i. i.d. with zero

mean and constant variance a2. Supposing we have an estimate of a denoted by a such that it is distributed independently of u and уРТ(a — a) —» N{0, A2), obtain the asymptotic distribution of defined by

~ (x + dl)’y p (x + al)'(x + al)’