The General Linear Model: The Basics

7.1 Invariance of the fitted values and residuals to non-singular transformations of the independent variables.

The regression model in (7.1) can be written as y = XCC-1" + u where Cisa non-singular matrix. LetX* = XC, theny = X*"* + u where "* = C-1".

a. PX* = X* (X*0X*)-1 X*0 = XC [C0X0XC]-1 C0X0 = XCC-1 (X0X)-1 c0-1 C0X0 = PX.

Hence, the regression of y on X* yields

y = X*" *ls = PX* y = PXy = X" ols which is the same fitted values as those from the regression of y on X. Since the dependent variable y is the same, the residuals from both regressions will be the same.

b. Multiplying each X by a constant is equivalent to post-multiplying the matrix X by a diagonal matrix C with a typical k-th element ck. Each Xk will be multiplied by the constant ck for k = 1,2,.., K. This diagonal matrix C is non-singular. Therefore, using the results in part (a), the fitted values and the residuals will remain the same.

c. In this case, X = [X1, X2] is of dimension nx2 and

is non-singular with C 1 = 2 results of part (a) apply, and we get the same fitted values and residuals when we regress y on (X1 — X2) and (X1 C X2) as in the regression of y on X1 and X2 .

B. H. Baltagi, Solutions Manual for Econometrics, Springer Texts in Business and Economics, DOI 10.1007/978-3-642-54548-1_7, © Springer-Verlag Berlin Heidelberg 2015

7.2 The FWL Theorem.

a. The inverse of a partitioned matrix A =

is given by

A"1 =

where B22 = (A22 — A21An1A1^ . From (7.9), we get

B22 = (x2X2 — X2X1 (x;x1)-1 x;x^ – = (X2X2 — X2PX1X2)-1

= (x2 PX1 x^ 1.

Also, —B22A21A"11 = — (X2PX1 X2p X2X1 (x;x^ Hence, from (7.9),
we solve for "2,ols to get

"2,ols = —B22A21A" X1y + B22X2y which yields

" 2,ols = — (X2PX1X2)-1 x2Px1y c (X2PX1X2)-1 X2y

= (x2Px, X2)-1 x2Px1y

as required in (7.10).

b. Alternatively, one can write (7.9) as

(ХІХ1) " 1,ols C (ХІХ2) "2,ols = X1y

(X2X1) " 1,ols C (X2X2) "2,ols = X2y.

Solving for " 1ols in terms of "2 ols by multiplying the first equation by (X,1X1) 1 we get

" 1,ols = (X1X1)-1 X1y — (X1X1)-1 X1X2" 2,ols

= (X1X1)-1 X1 (y — X2" 2,ols).

Substituting " i, ols in the second equation, we get

X2Xi (XiXi)_1 Xiy – X2PX1X2" 2,ols + (X2X2)" 2,ols = X2 y.

Collecting terms, we get (X2Px1 X2) "2,ols = X2Px1 y.

Hence, "2,ols = (X2PxiX2) 1 X2Pxiy as given in (7.10).

c. In this case, X = [tn, X2] where tn is a vector of ones of dimension n.

Px1 = tn (i^in) 1 С = 1п1П/п = Jn/n where Jn = ini; is a matrix of ones

n _

of dimension n. But i^y = YI Уі and i^y/n = y. Hence, PX1 = In — PX1 =

i=1

In — Jn/nand PX1 y = (In — Jn/n)y has a typical element (yi — y). From the FWL Theorem, "2,ols can be obtained from the regression of (yi — y) on the set of variables in X2 expressed as deviations from their respective means,

i. e.,Px1X2 = (In — Jn/n) X2.

From part (b),

" 1,ols = (X1X1)_1 X1 (y — X2"2,ols) = (l^Ln)"1 in (y — X2"2,ols)

= (y — X2"2,ols) = y — X2"2,ols

where X2 = inX2/n is the vector of sample means of the independent variables in X2.

7.3 Di = (0,0,.., 1,0,.., O)0 where all the elements of this nx1 vector are zeroes except for the i-th element which takes the value 1. In this case, PDi = Di (D|D^ 1 D0 = DiD0 which is a matrix of zeroes except for the i-th diagonal element which takes the value 1. Hence, In — PDi is an identity matrix except for the i-th diagonal element which takes the value zero. Therefore, (In — PDi )y returns the vector y except for the i-th element which is zero. Using the FWL Theorem, the OLS regression

y = X" + Di" + u

yields the same estimates as (In — PDi )y = (In — PDi)X" + (In — PDi )u which can be rewritten as y = X" + u with y = (In — PDi)y, IX = (In — PDi)X.

The OLS normal equations yield (X’X)"ols = X0y and the i-th OLS normal equation can be ignored since it gives О0(3ols = 0. Ignoring the i-th observation equation yields (X*,X*)|°ols = X*0y* where X* is the matrix X without the i-th observation and y* is the vector y without the i-th observation. The FWL Theorem also states that the residuals from y on X are the same as those from y on X and Di. For the i-th observation, y; = 0 and xi = 0. Hence the i-th residual must be zero. This also means that the i-th residual in the original regression with the dummy variable D; is zero, i. e., yi — x||3ols — "ols = 0. Rearranging terms, we get "ols = yi — x[(3ols. In other words, "ols is the fore­casted OLS residual for the i-th observation from the regression of y* on X* . The i-th observation was excluded from the estimation of (3ols by the inclusion of the dummy variable Db

7.5 If u ~ N(0, o2In) then (n — K)s2/o2 ~ x2-K. In this case,

a. E[(n — K)s2/o2] = E (x2-^ = n — K since the expected value of a x2 random variable with (n — K) degrees of freedom is (n — K). Hence,

[(n — K)/o2] E(s2) = (n — K) or E(s2) = o2.

b. var[(n — K)s2/o2] = var (x2-K) = 2(n — K) since the variance of a x2 random variable with (n — K) degrees of freedom is 2(n — K). Hence,

[(n — K)2/o4] var(s2) = 2(n — K) or var(s2) = 2o4/(n — K).

7.6 a. Using the results in problem 7.4, we know that 6m2le = e0e/n = (n—K)s2/n.

Hence,

E (6mle) = (n — K) E(s2)/n = (n — K)°2/n.

This means that 6^ is biased for o2, but asymptotically unbiased. The bias is equal to —Ko2/n which goes to zero as n! i.

b. var = (n-K)2 var(s2)/n2 = (n-K)22o4/n2(n-K) = 2(n-K)o4/n2 and

MSE (^e) = Bias2 (S^e) + var (S^e) = K2s4/n2 + 2(n – K)s4/n2 = (K2 + 2n – 2K) s4/n2.

c. Similarly, 52 = e0e/r = (n – K)s2/r with E(S2) = (n – K)s2/r and var(52) = (n – K)2 var(s2)/r2 = (n – K)22s4/r2(n – K) = 2(n – K)s4/r2 MSE(52) = Bias2(52) + var(52) = (n – K – r)V/r2 + 2(n – K)o4/r2.

Minimizing MSE(52) with respect to r yields the first-order condition 9MSE (52) _ -2(n – K – r)s4r2 – 2r(n – K – r)V 4r(n – K)s4

@r r4

which yields

(n – K – r)r + (n – K – r)2 + 2(n – K) = 0 (n – K – r)(r + n – K – r) + 2(n – K) = 0 (n – K)(n – K – r + 2) = 0 since n > K, this is zero for r = n – K + 2. Hence, the Minimum MSE is obtained at 5 = e0e/(n – K + 2) with

MSE (52) = 4s4/(n – K + 2)2 + 2(n – K)o4/(n – K + 2)2

= 2s4(n – K + 2)/(n – K + 2)2 = 2s4/(n – K + 2).

Note that s2 = e0e/(n – K) with MSE(s2) = var(s2) = 2s4/(n – K) > MSE (52).

Also, it can be easily verified that MSE (5^^) = (K2 – 2K + 2n)o4/n2 > MSE (cr2) for 2 < K < n, with the equality holding for K = 2.

7.7 Computing Forecasts and Forecast Standard Errors Using a Regression Package. This is based on Salkever (1976). From (7.23) one gets

a. The OLS normal equations yield

" ols Y ols

or (X’X)Pols + (xox^0ols + xoYols = X0y + xoyo and Xo |3 ols + Yols = yo

From the second equation, it is obvious that Yols = yo — Xo0ols. Substituting this in the first equation yields

(X’X)P ols + (xoxo)p ols + xoyo — xoXoP ols = X0y + xoyo

Premultiplying (7.23) by Px, is equivalent to omitting the last To obser­vations. The resulting regression is that of y on X which yields 0ols = (X, X)_1X, y as obtained above. Also, premultiplying by Px,, the last To observations yield zero residuals because the observations on both the
dependent and independent variables are zero. For this to be true in the original regression, we must have yo — Xo"ols — Yols = 0. This means that Yols = yo — Xo"ols as required.

b. The OLS residuals of (7.23) yield the usual least squares residuals

eols = y X" ols for the first n observations and zero residuals for the next To observations. This means that e*0 = (eols, 0′) and e*’e* = eolseols with the same residual sum of squares. The number of observations in (7.23) is n+To and the num­ber of parameters estimated is K + To. Hence the new degrees of freedom in (7.23) is (n C To) — (K C To) = (n — K) = the old degrees of freedom in the regression of y on X. Hence, s*2 = e*’e*/(n — K) = eolseols/(n — K) = s2.

c. Using partitioned inverse formulas on (X*’X*) one gets

(X’X)-1 —(X’X)-1Xo

—Xo(X’X)-1 ITo c Xo(X’X)-1Xo.

This uses the fact that the inverse of

A _ ГА11 A12i • A_^ = B11 —B11A12A221

LA21 A2d L—A^21A21 B11 A-1 C A-1 A21B11A12 A-1_

where B11 = (An — A12A-21A21)-1. Hence, s*2(X*’X*)-1 = s2(X*’X*)-1

and is given by (7.25).

d. If we replace yo by 0 and ITo by —ITo in (7.23), we get

y

0

or y* = X*8 C u*. Now X*0X* =

yield (X’X)°ols C (XoXo)Pols — xo"ols = X’y and — Xo°ols C "ols = 0.

From the second equation, it immediately follows that "ols = Xo|3ols = yo the forecast of the To observations using the estimates from the first n

observations. Substituting this in the first equation yields (X’X)Pols C (XoXo) "ols – XoXo"ols = X0y

As in part (a), premultiplying by PX2 omits the last To observations and yields "ols based on the regression of y on X from the first n observations only. The last To observations yield zero residuals because the dependent and independent variables for these To observations have zero values. For this to be true in the original regression, it must be true that 0 — Xo"ols C уols = 0 which yields уols = Xo"ols = yo as expected. The residuals are still (eols, 00) and s*2 = s2 for the same reasons given in part (b). Also, using partitioned inverse as in part (c) above, we get ‘ (X0X)-1 (x, x)-1xo

Xo(X’X)-1 Ito c Xo(x0x)-1xo.

Hence, s*2(X*0X*) 1 = s2(X*0X*) 1 and the diagonal elements are as given in (7.25).

7.8 a. cov(pols, e) = E("ols — P)e0 = E[(X0X)-1X0uu0Px] = o2(X0X)-1X0Px = 0 where the second equality uses the fact that e = PX u and "ols = P C (X0X)-1 X0u. The third equality uses the fact that E(uu0) = o2In and the last equality uses the fact that PXX = 0. But e ~ N(0, o2PX) and Pols ~ N(P, o2(X0X)-1), therefore zero covariance and normality imply independence of "ols and e.

b. Pols — p = (X0X)-1X0u is linear in u, and (n — K)s2 = e0e = u0PX u is quadratic in u. A linear and quadratic forms in normal random variables

u ~ N(0, o2In) are independent if (X0X)-1X0PX = 0, see Graybill (1961), Theorem 4.17. This is true since PXX = 0.

7.9 a. Replacing R by c0 in (7.29) one gets (c0"ols — c0")0[c0(X0X)-1c]-1 (c0"ols — c0")/ct2. Since c0(X0X)-1c is a scalar, this can be rewritten as

(c0" ols — c0")2/a2c0(X0X)-1c

which is exactly the square of zobs in (7.26). Since zobs ~ N(0,1) under the null hypothesis, its square is xb under the null hypothesis.

b. Dividing the statistic given in part (a) by (n — K)s2/o2 ~ хП-k divided by its degrees of freedom (n-K) results in replacing o2 by s2, i. e.,

(c0" ols — c0")2/s2c0(X0X)-1c.

This is the square of the t-statistic given in (7.27). But, the numerator is zobs ~ x? and the denominator is X^-K^n—K). Hence, if the numerator and denominator are independent, the resulting statistic is distributed as F(1,n – K) under the null hypothesis.

7.10 a. The quadratic form u0Au/o2 in (7.30) has

A = X(X0X)-1R0 [R(X0X)-1R0]-1 R(X0X)-1X0.

This is symmetric, and idempotent since

A2 = X(X0X)-1R0 [R(X0X)-1R0]-1 R(X0X)-1X0X(X0X)-1

R0 [R(X0X)-1 R0]-1 R(X0X)-1X0

= X(X0X)-1R0 [R(X0X)-1R0]-1 R(X0X)-1X0 = A

and rank (A) = tr(A) = tr (R(X0X)-1(X0X)(X0X)-1 R0 [R(X0X)-1R0]-1) = tr(Ig) = g since R is gxK.

b. From lemma 1, u0Au/o2 ~ xg since A is symmetric and idempotent of rank g and u ~ N(0, o2In).

7.11 a. The two quadratic forms s2 = u, Pxu/(n — K) and u’Au/o2 given in (7.30) are independent if and only if PXA = 0, see Graybill (1961), Theorem4.10. This is true since PXX = 0.

b. (n — K)s2/o2 is x2-k and u’Au/o2 ~ xg and both quadratic forms are independent of each other. Hence, dividing xg by g we get u’Au/go2. Also, хП-k by (n-K) we get s2/o2. Dividing u0Au/go2 by s2/o2 we getu0Au/gs2 which is another way of writing (7.31). This is distributed as F(g, n-K) under the null hypothesis.

7.12 Restricted Least Squares

a. From (7.36), taking expected value we get

E(Prls) = E("ols) C (X0X)-1R0 [R(X0X)-1 R0]-1 (r-RE(Pols))

= P C (X0X)-1R0 [R(X0X)-1R0]-1 (r-R")

since E ^Pols^ = ". It is clear that "rls is in general biased for " unless r = R" is satisfied, in which case the second term above is zero.

b. var(Prls) = E[Prls — E(Prls)]["rls — E(Prls)]0

But from (7.36) and part (a), we have

Prls — E(Prl^ = (Pols — P) C (X0X)-1R0 [R(X0X)-1R^ 1 R(P — Pols)

using Pols — P = (X0X)-1 X0u, one gets Prls — E(Prls) = A(X0X)-1X0u where A = IK—(X0X)-1R0 [R(X0X)-1R0]-1R. It is obvious that A is not symmetric, i. e., A ф A0. However, A2 = A, since

IK — (X0X)-1R0 [R(X0X)-1R0]-1 R — (X0X)-1R0 [R(X0X)-1R0]-1 R C (X0X)-1 R0 [R(X0X)-1R0]-1 R(X0X)-1R0 [R(X0X)-1R0]-1 R IK — (X0X)-1R0 [R(X0X)-1R0]-1 R = A.

. uu X(XX) A I = a

a2 (X0X)-1 – (X0X)-1R0 [R(X0X)-1R0]-1 R(X0X)-1

– (X0X)-1R0 [R(X0X)-1R0]-1 R(X0X)-1

+ (X0X)-1R0 [R(X, X)“1R,]_1 R(X0X)-1R0

[R(X, X)“1R,]_1 R(X0X)-1

(X0X)-1 – (X0X)-1R0 [R(X, X)“1R,]_1 R(X0X)-1

c. Using part (b) and the fact that var ols^ = a2(X0X)-1 gives var ols^ —

rls^ = a2(X0X)-1R0[R(X0X)-1R0]-1R(X0X)-1 and this is positive

semi-definite, since R(X0X) 1R0 is positive definite.

7.13 The Chow Test

a. OLS on (7.47) yields

"1,ols 02,ols j

which is OLS on each equation in (7.46) separately.

b. The vector of OLS residuals for (7.47) can be written as e0 = (e^, e2) where e1 = y1 — X1" 1,ols and e2 = y2 — X2(32,ols are the vectors of OLS residuals from the two equations in (7.46) separately. Hence, the residual sum of squares = e0e = e01e1 + e2e2 = sum of the residual sum of squares from running yi on Xi for i = 1, 2.

c. From (7.47), one can write

the X matrix in (7.47) is related to that in (7.49) as follows: C

and the coefficients are therefore related as follows:

‘YM =C-1fP1

"2 — P1 /

as required in (7.49). Hence, (7.49) yields the same URSS as (7.47). The RRSS sets (P2 — p1) = 0 which yields the same regression as (7.48).

7.14 a. The FWL Theorem states that p2,ols from PX1y = PX1X2"2 C PX1u will be identical to the estimate of "2 obtained from Eq. (7.8). Also, the residuals from both regressions will be identical. This means that the RSS from (7.8) given by y0PXy = y’y — y0PXy is identical to that from the above regression. The latter is similarly obtained as

y0PX1y — (X2PX1X2)-1 X2PX1PX1y

= y0Px1y — y0Px1X2 (X2Px1X2)-1X2Px1y

b. For testing Ho; "2 = 0, the RRSS = y0PX1y and the URSS is given in part (a). Hence, the numerator of the Chow F-statistics given in (7.45) is given by

(RRSS-URSS)/ k2 = y0Px1X2 ^Px^)-1 X2?x1y/k2

Substituting y = Xi"i + u under the null hypothesis, yields u0Px1X2 (X2PX1X2)_1X2PX1u/k2 since PX1X1 = 0.

c. Let v = X2,PXlu. Given that u ~ N (0, o2In), then v is Normal with mean zero and var(v) = X2PXl var(u)PXlX2 = o2X2PX1X2 since PXl is idempo – tent. Hence, v ~ N (0, u2X2PXl X2). Therefore, the numerator of the Chow F-statistic given in part (b) when divided by o2 can be written as v0[var(v)]-1v/k2. This is distributed as xk2 divided by its degrees of freedom k2 under the null hypothesis. In fact, from part (b), A = PXlX2 (X2PXl X2) 1 X2PXl is symmetric and idempotent and of rank equal to its trace equal to k2. Hence, by lemma 1, u0Au/o2 is xk2 under the null hypothesis.

d. The numerator u0Au/k2 is independent of the denominator (n — k)s2/(n — k) = u0PXu/(n — k) provided PXA = 0 as seen in problem 7.11. This is true because PxPxi = Px (In — Pxi) = Px — PxXi (X1 X1)-1 X1 = Px since PXX1 = 0. Hence,

PxA = PXPX1X2 (X2PX1X2)_1 X2PX1 = PxX2 (X2PX1X2)_1 X2PX1 = 0 since PXX2 = 0. Recall, PXX = PX[XbX2] = [PXX1,PXX2] = 0.

e. The Wald statistic for Ho; "2 = 0 given in (7.41) boils down to replacing R by [0, Ik2] and r by 0. Also, "mle by "ols from the unrestricted model given in (7.8) and o2 is replaced by its estimate s2 = URSS/(n — k) to make the Wald statistic feasible. This yields W = "2[R(X0X)_1R0]_1"2/s2. From problem 7.2, we showed that the partitioned inverse of X0X yields B22 = (X2PX1X^ 1 for its second diagonal (k2xk2) block. Hence,

Also, from problem 7.2, "2,ols = (X2Px1 X^ 1 X2Px1 y = (X2Px1 X^ 1 X2Px1 u after substituting y = X1"1 + u under the null hypothesis and using

PX1 Xi = 0. Hence,

s2W = u0PxiX2 (X2PxiX2)_1 (X2PxiX2) (X2PxiX2)_1 X2PxiU = u0PxiX2 (X2PxiX2)_1 X2PxiU

which is exactly k2 times the expression in part (b), i. e., the numerator of the Chow F-statistic.

f. The restricted MLE of " is 1rls, 0^ since "2 = 0 under the null hypothesis. Hence the score form of the LM test given in (7.44) yields

(y – X1"ик)0 X(X, X)-1X^y – X1"Uk) /о2.

In order to make this feasible, we replace o2 by s2 = RRSS/(n — k1) where RRSS is the restricted residual sum of squares from running y on X1. But this expression is exactly the regression sum of squares from run­ning (y — X1" 1,rls)/s on the matrix X. In order to see this, the regression sum of squares of y on X is usually y0Pxy. Here, y is replaced by (y — X1" 1,rls)/s.

7.15 Iterative Estimation in Partitioned Regression Models. This is based on Baltagi (1996).

a. The least squares residuals of y on X1 are given by PX1 y, where PX1 = I — PX1 and PX1 = X1 (X1X^ 1 X1. Regressing these residuals on x2 yields b21) = (x2x2) 1 x2Px1 y. Substituting for y from (7.8) and using PX1X1 = 0 yields b21) = (x2x^-1 x2Px1(x2^2 + u) with

E (b^) = (x2x2)-1 x2Px1x2"2 = "2 — (x2x2)-1 x2Px^2 = (1 — a)"

where a = (x2PX1 x2) / (x2x^ is a scalar, with 0 < a < 1. a ф 1 as long as x2 is linearly independent of X1. Therefore, the bias f b2^ ) = — a"2.

b. b(11) = (X’Xi) 1 X’ (y – x2b21)) = (X’X^ 1 X (I – Px2PxO y and

b22/ = (x2x2)_1 x2 (у – X(b(()) = (x2x2)_1 x2 [I – PX1 (I – Px2PX1)] y = (x2x2)_1 x2 [PX1 + Px1Px2PX1 ] y = (1 + a)b21).

Similarly,

b(2) = (XX)_1 X1 (y – x2b22^ = (XX)_1 X1 (y – (1 C a)x2b21)) b23) = (x2x2)_1 x2 (y – X1b12))

= (x2x2)_1 x2 y – PX1 (y – (1 + a)x2b21))

= b21) C (1 C a) (x2x2) 1 x2Px1x2b21) = (1 C a C a2)b21).

By induction, one can infer that

b22+1) = (1 C a C a2 C.. C aj)b21) for j = 0,1,2,…

Therefore,

E (b20+1^ = (1 C a C a2 C.. C aj)E

= (1 C a C a2 C.. C aj)(1 – a)^ = (1 – aj+1)"2

and the bias (b^P1^ = —aj+1"2 tends to zero as j! 1, since |a| <1.

c. Using the Frisch-Waugh-Lovell Theorem, least squares on the original model yields

"2 = (x2pX1x2) 1 x2pX1y = (x2x2 – x2pX1x^_1 х2РХ1У

= (1 – a)_1 (x2x2)_1 x2pX1y = b21)/(1 – a).

Asj! i, lim b22+1) = aj^ b21) = b21)/(1 – a) = "2.

7.16 Maddala (1992, pp. 120-127).

a. For Ho; " = 0, the RRSS is based on a regression of yi on a constant. This yields a = y and the RRSS = P (yi – y)2 = usual TSS. The URSS is the

i=1

usual least squares residual sum of squares based on estimating a and ". The

1 n

logL(a, ", a2) = -2 log 2 л – 2 log a2 – — (yi – a – "X0

log-likelihood in this case is given by

with the unrestricted MLE of a and " yielding ‘ols = y – "olsX and

nn

" ols = Xi – X Уі Xi – X2

2 n n

logL amle,"mle, amle = -2 log2 л – 2 log

and amle = URSS/n. In this case, the unrestricted log-likelihood yields

= -2 log 2 л – 2 – 2 log(URSS/n).

Similarly, the restricted MLE yields armle = y and "rmle = 0 and

n

= RRSS/n = £ (yi – y)2/n.

i=1

The restricted log-likelihood yields

– V log armle –

/ 2 n n

log L ‘rmle, "rmle, armle = – log 2 Л – 2 log

= -2 log2 л – 2 – 2 log(RRSS/n). Hence, the LR test is given by

LR = n(log RRSS-logURSS) = n log(RRSS/URSS)

= nlog(TSS/RSS) = n log(1 /1 – r2)

where TSS and RSS are the total and residual sum of squares from the unre­stricted regression. By definition R2 = 1 – (RSS/TSS) and for the simple regression rXY = R2 of that regression, see Chap. 3.

b. The Wald statistic for Ho; " = 0 is based upon r^"mle^ = ("mle – 0) and R mle) = 1 and from (7.40), we get W = "mle/var("mle) =

" 2ls/var(" ols).

This is the square of the usual t-statistic for " = 0 with 0mle used instead of s2 in estimating o2. Using the results in Chap. 3, we get

with omle = URSS/n = TSS(1 — R2)/n from the definition of R2. Hence,

11

using the definition of r2XY = r2 = R2 for the simple regression.

c. The LM statistic given in (7.43) is based upon LM = "2ls/or2mle ^P xi^. This is the t-statistic on " = 0 using 0r2mle as an estimate for o2. In this

n / n 2 n

case, o^ = RRSS/n = yi2/n. Hence, LM = n xiyi Уі2

i=1 i=1 i=1

n

P xi2 = nr2 from the definition of rXY = r2.

i=1

d. Note that from part (b), we get W/n = r2/(1 — r2) and 1 + (W/n) = 1/(1 — r2). Hence, from part (a), we have (LR/n) = log(1/1 — r2) = log[1 + (W/n)].

From part (c), we get (LM/n) = (W/n)/[1 + (W/n)]. Using the inequality x > log(1 + x) > x/(1 + x) with x = W/nweget W > LR > LM.

e. From Chap. 3, R2 of the regression of logC on logP is 0.2913 and n = 46. Hence, W = nr2/1 — r2 = (46)(0.2913)/(0.7087) = 18.91,

LM = nr2 = (46)(0.2913) = 13.399

and LR = n log(1 /1 — r2) = 46 log(1/0.7087) = 46 log(1.4110342) = 15.838. It is clear that W > LR > LM in this case.

7.17 Engle (1984, pp. 785-786).

X)yt— 0E yt— 0T+0E yt

t=1 t=1

T

= £ (yt — 0)/0(1 — 0)

t=1

The MLE is given by setting S(0) = 0 giving 0mle = yt/T = y.

t=1

02(1 — 02) = T/0(1 — 0).

b. For testing Ho I 0 = 0o versus HaI 0 ф 0o, the Wald statistic given in (7.40) has r(0) = 0 — 0o and R(0) = 1 with I-1 (^0mle^ = 0mle (1 — 0mle)/T. Hence,

W = T (0mle — 0o)2 j 0mle ( 1 — 0mle) = T(y — 0o)2/y(1 — y).

The LM statistic given in (7.42) has S(0o) = T(y — 0o)/0o(1 — 0o) and I-1(0o) = 0o(1 — 0o)/T. Hence,

ил_ T2(y — 0o)2 0o(1 — 0Q^ T(y — 0o)2

[0o(1 — 0o)]2 T 0o(1 — 0o) .

The unrestricted log-likelihood is given by

T

logL (§mle) = logL(y) = [yt logy + (1 – yt) log(l – y)]

t=1

= Tytogy + T(1 – y) log(1 – y).

The restricted log-likelihood is given by

T

log L(0o) = ^2 [yt log 0o + (1 – yt) log(1 – 0o)]

t=1

= Tylog0o C T(1 – y) log(1 – 0o).

Hence, the likelihood ratio test gives

LR = 2Ty(logy – log 0o) C 2T(1 – y) [log(1 – y) – log(1 – 0o)]

= 2Tylog(y/0o) + 2T(1 – y) log [(1 – y)/(1 – 0o)] .

All three statistics have a limiting x2 distribution under Ho. Each statistic will reject when (y – 0o)2 is large. Hence, for finite sample exact results one can refer to the binomial distribution and compute exact critical values.

7.18 For the regression model

y = X" C u with u ~ N(0, ct2It).

a. L(", о2) = (1/2kct2)t/2 exp {-(y – X")'(y – X")/2ct2} logL = —(T/2)(log 2 л C log о2) – (y – X")'(y – X")/2ct2 9 logL/9" = —(-2X0y C 2X0X")/2о2 = 0

so that"mle = (X, X)_1X, yand 9 logL/9o2 = – T/2o2 C 0,0/2о4 = 0. Hence, &mle = u’u/T (where u = y – X"mle).

b. The score for " is S(") = 9 logL(")/9" = (X0y – X’XP)/ct2.

The Information matrix is given by

‘ 92 logL ‘

= – E

" X0X о2

X0y – X0X" ‘ о4

, 9(", о2)9(", о2)0.

X0y – X0X"

T

(y – X")0(y – X")

– о4

2о4

о6 J

I(", о2) = – E

since E(X, y – X’X") = X, E(y – X") = 0 and E(y – XP)0(y – X") = To2

" X0X 0 ‘

I(P, o2) = JL

_ 0 2o4,

which is block-diagonal and also given in (7.19).

c. The Wald statistic given in (7.41) needs

r(P) = "i – P?

R(P) = [Ik1,0]

W = (Pi – Pi)’

with (X0X and

a = (x’X1 – X1X2 (X2X2)-1 x2*)-1 = X? [i – X2 (x2x^ 1 X2j XiJ = [XIPX2X1]

by partitioned inverse. Therefore,

W = (p0 – P1)0 [X1PX2X1^ P1 – P1) / o2

as required. For the LR statistic LR = —2 (log L* – log L*) where L* is the

restricted likelihood and L* is the unrestricted likelihood. T T T (RRSSV

2 – 2 log2Л – 2 lo4— )

, T T T (URSS

-(-2 – 2log 2” – 2 log( —)

= T log(RRSS/URSS) = Tlog —

|_u0

For the LM statistic, the score version is given in (7.42) as LM = S(p)01-1 (p) S (p)

where

The restriction on "i is ("i = "0), but there are no restrictions on "2. Note that"2 can be obtained from the regression of (y — Xi Pi) on X2. This yields

P 2 = (X2X2)"1 X2 (y — Xip?).

Therefore,

5 ^(p 2) = X2 (y — xp) = X2y — X2Xi"? — X2X2 p 2

= X2 y — X2XiP? — X2y + X2XiP? = 0.

Hence, LM = Si(p) I11 (") S? where In(P) is obtained from the par­titioned inverse of I-1("). Since S? = Xi (y — X"^ jo2 = X? u/52

and I_i(p) = 52(X0X)-1 withI11 = 52 [X, iPX2Xi]_i we get

LM = u0Xi. [XiPX2X^"1 X? u/52

d. W = ("? — "i)’ [X’PX2Xi^P0 — "i) /52

= ("0 — Pi) [R (x’x)"1 r0]- ("0 — "i) /52

= (r — R")0 R (X0X)_1 R0 1 (r — R") /52.

From (7.39) we know thatu0u—u0u = (r — R"^ [R(X0X) ?R0] 1 (r — R"^. Also, 52 = u0u/T. Therefore, W = T(u0u — u0u)/u0u as required.

LM = u0X? [X? P^] 1 X? u/5

From (7.43) we know that LM = (r — Rp)0[R(X0X)-1R0]-1(r — R")/52 and 52 = u0u/T. Using (7.39) we can rewrite this as LM = T (u0u — u0u)/u0u as required. Finally,

LR = T log (u0u/u0u) = T log Г1 + (u u~.uu M = T log(1 + W/T).

, u’u

Also, W/LM = — = 1 + W/T. Hence, LM u 0U

from (7.45)

Using the inequality x > log(1 + x) > x/(1 + x) with x = W/T we get (W/T) > log(1 + W/T) > (W/T)/(1 + W/T) or (W/T) > (LR/T) >

(LM/T) or W > LR > LM. However, it is important to note that all the statistics are monotonic functions of the F-statistic and exact tests for each would produce identical critical regions. e. For the cigarette consumption data given in Table 3.2 the following test statistics were computed for Ho; " = —1

Wald = 1.16 > LR = 1.15 > LM = 1.13

and the SAS program that produces these results is given below. f. The Wald statistic for HA; " = —1 yields 1.16, for HB; "5 = —1 yields 0.43 and for HC; "-5 = —1 yields 7.89. The SAS program that produces these results is given below.

SAS PROGRAM Data CIGARETT;

Input OBS STATE $LNCLNP LNY; Cards;

ProcIML; Use CIGARETT;

Read all into Temp;

N=NROW(TEMP); ONE=Repeat(1,N,1); Y=Temp[,2]; X=ONE||Temp[,3]||Temp[,4];

BETA_U=INV(X’*X)*X’*yY;

R={0 1 0};

Ho=BETA_U[2,]+1;

BETA_R=BETA_U+INV(X’*X)*R’*INV(R*yINV(X’*X)*R’)*Ho;

E^U=Y-X*BETA_U;

ET_R=Y-X*BETA_R;

SIG_U=(ET_U’*ET_U)/N;

SIG_R=(ET_R’*ET_R)/N;

X1=X[,2];

X2=X[,1]||X[,3];

Q_X2=I(N)-X2*INV(X2’*X2)*X2′;

VAR_D=SIG_U*I NV(X1’*Q_X2*X1); WALD=Ho’*INV(VAR_D)*Ho;

LR=N*LOG(1+(WALD/N));

LM=(ET_R’*X1*INV(X1’*Q_X2*X1)*X1’*ET_R)/SIG_R;

*WALD=N*(ET_R’*ET_R-ET_U’*ET_U)/(ET_U’*ET_U);

*LR=N*Log(ET_R’*ET_R/(ET_U’*ET_U));

*LM=N*(ET_R’*ET_R-ET_U’*ET_U)/(ET_R’*ET_R);

PRINT ‘Chapter"7 Problem"18. (e)’,, WALD; PRINT LR;

PRINT LM;

BETA=BETA_U[2,];

H1=BETA+1;

H2=BETA**5+1;

H3=BETA**(-5)+1;

VAR_D1=SIG_U*INV(X1’*Q_X2*X1);

VAR_D2=(5*BETA**4)*VAR_D1*(5*BETA**4);

VAR_D3=(-5*BETA**(-6))*VAR_D1*(-5*BETA**(-6));

WALD1=H1’*INV(VAR_D1)*H1;

WALD2=H2’*INV(VAR_D2)*H2;

WALD3=H3’*INV(VAR_D3)*H3;

PRINT ‘Chapter"7 Problem"18.(f)’,, WALD1; PRINT WALD2;

PRINT WALD3;

7.19 Gregory and Veall (1985).

a. For HA : "i – 1/^2 = 0, we have rA(") = "і – 1/" and "’ = ("o, "1, "2/. In this case, Ra(") = (0,1,1/"2) and the unrestricted MLE is OLS on (7.50) with variance-covariance matrix V("ols) = о2(X’X)-1 where = URSS/n. Let v;j denote the corresponding elements of V("ols) for i, j = 0,1,2. Therefore,

wa = (p 1 – 1/P2 0,1,1/"2) V (Ц (0,1,1/"2)0 " 1 – 1/P2)

= (° 1°2 – ^ / (P2 v11 + 2v12 + v22/°2)

as required in (7.52). Similarly, for HB; "1"2 – 1 = 0, we have rB(") = In this case, RB(") = (0, "2, "1) and

WB = 1°2 – 1 0, P2, P1) V (Pols) (0, P2, P1 P 1p2 – 1)

= (P1P2 – 1^ j (P2v11 + 2P1 P2 v12 + P2 v22^

as required in (7.53).

7.20 Gregory and Veall (1986).

a. From (7.51), we get W = r((3ols)'[R(3ols)c2(X’X) 1R((3ols)’] ^(ІЗols). For HA; "1 p + "2 = 0, we have r(") = "1 p + "2 and R(") = ("1, p, 1) where "’ = (p, "1, "2). Hence,

(P1P + P2 P1, P, 1) o2(X’X)-1 (j3 1, p, 1)’ p 1P + P2)

Where the typical element of the matrix are [yt-1,xt, xt-1]. For HB; "1 + ("2/p) = 0, we have r(") = "1 + ("2/p) and

R(") = – "2, 1,1 .Hence,

p2 p

WB = (" 1 C "2/p) ("l + iWp) •

_"2, 1, IWxW -"2, 1,[2]

1 P[3] P P2 P,

"2 1

1, _ 2(x’x)-4 1, _ "I2,1

v P? Pj V "2 P1,

For HC; p C ("2/"1) = 0, we have

p C I •

for HD, ("1 p/"2) C 1 = 0, we have

r(") = ("1 p/"2) C 1 and R(")

Hence,

SAS PROGRAM

Data CONSUMP;

Input YEAR YC; cards;

PROC IML; USE CONSUMP; READ ALL VAR {Y C};

Yt=Y[2:NROW(Y)];

YLAG=Y[1:NROW(Y)-1];

Ct=C[2:NROW(C)];

CLAG=C[1:NROW(C)-1];

X=CLAG || Yt || YLAG;

BETA=INV(X*X)*X’*Ct;

RH0=BETA[1];

BT1=BETA[2];

BT2=BETA[3];

Px=X*INV(X *X)*X ; Qx=I(NROW(X))-Px; et_U=Qx*Ct;

SIG_U=SSQ(et_U)/NROW(X);

Ha=BT1*RHO+BT2;

Hb=BT1+BT2/RHO;

Hc=RHO+BT2/BT1;

Hd=BT1*RHO/BT2+1;

Ra=BT1 || RHO || {1};

Rb=(-BT2/RHO**2) || {1}|| (1/RHO);

Rc={1} || (-BT2/BT1**2) || (1/BT1);

Rd=(BT1/BT2) || (RHO/BT2) || (-BT1*RHO/BT2**2);

VAR_a=Ra*SIG_U*INV(X’ *X)*Ra’ ; VAR_b=Rb*SIG_U*INV(X’ *X)*Rb’ ;

VAR_c=Rc*SIG_U*INV(X’*X)*Rc’;

VAR_d=Rd*SIG_U*INV(X’*X)*Rd’;

WALD_a=Ha’*INV(VAR_a)*Ha;

WALD_b=Hb’*INV(VAR_b)*Hb;

WALD_c=Hc’*INV(VAR_c)*Hc;

WALD_d=Hd’*INV(VAR_d)*Hd;

PRINT ‘Chapter"7 Problem20. (b)’,,WALD_a;

PRINT WALD_b;

PRINT WALD_c;

PRINT WALD_d;

7.21 Effect of Additional Regressors on R2. For the regression equation y = X" + u the OLS residuals are given by e = y — X"ols = PXy where PX = In — PX, and Px = X(X0X)_1X0 is the projection matrix. Therefore, the SSE for this regression is e0e = y0PXy. In particular, SSE1 = y0PX1y, for X = X1 and SSE2 = y0PXy for X = (X1,X2). Therefore,

SSEi — SSE2 = y0 (Px1 — Px) y = y0 (Px — PX1) y = y0Ay

where A = PX — PX1 . This difference in the residual sums of squares is non­negative for any vector y because y0Ay is positive semi-definite. The latter result holds because A is symmetric and idempotent. In fact, A is the differ­ence between two idempotent matrices that also satisfy the following property: PxPx1 = Px1 Px = Px1 . Hence,

A2 = PX — PX1 — PX1 C PX1 = Px — PX1 = A.

R2 = 1 – (SSE/TSS) where TSS is the total sum of squares to be explained by the regression and this depends only on the y’s. TSS is fixed for both regressions. Hence R2 > R2, since SSE1 > SSE2.

References

Baltagi, B. H. (1996), “Iterative Estimation in Partitioned Regression Models,” Econometric Theory, Solutions 95.5.1, 12:869-870.

Engle, R. F. (1984), “Wald, Likelihood Ratio, and Lagrange Multiplier Tests in Econometrics,” In: Griliches, Z. and M. D. Intrilligator (eds) Handbook of Econometrics (North-Holland: Amsterdam).

Graybill, F. A. (1961), An Introduction to Linear Statistical Models, Vol. 1 (McGraw – Hill: New York).

Gregory, A. W. and M. R. Veall (1985), “Formulating Wald Tests of Nonlinear Restrictions,” Econometrica, 53: 1465-1468.

Gregory, A. W. and M. R. Veall (1986), “Wald Tests of Common Factor Restrictions,” Economics Letters, 22: 203-208.

Maddala, G. S. (1992), Introduction to Econometrics (Macmillan: New York).

Salkever, D. (1976), “The Use of Dummy Variables to Compute Predictions, Predic­tion Errors, and Confidence Intervals,” Journal of Econometrics, 4: 393-397.

CHAPTER 8

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>