The General Linear Model: The Basics
7.1 Invariance of the fitted values and residuals to nonsingular transformations of the independent variables.
The regression model in (7.1) can be written as y = XCC1" + u where Cisa nonsingular matrix. LetX* = XC, theny = X*"* + u where "* = C1".
a. PX* = X* (X*0X*)1 X*0 = XC [C0X0XC]1 C0X0 = XCC1 (X0X)1 c01 C0X0 = PX.
Hence, the regression of y on X* yields
y = X*" *ls = PX* y = PXy = X" ols which is the same fitted values as those from the regression of y on X. Since the dependent variable y is the same, the residuals from both regressions will be the same.
b. Multiplying each X by a constant is equivalent to postmultiplying the matrix X by a diagonal matrix C with a typical kth element ck. Each Xk will be multiplied by the constant ck for k = 1,2,.., K. This diagonal matrix C is nonsingular. Therefore, using the results in part (a), the fitted values and the residuals will remain the same.
c. In this case, X = [X1, X2] is of dimension nx2 and
is nonsingular with C 1 = 2 results of part (a) apply, and we get the same fitted values and residuals when we regress y on (X1 — X2) and (X1 C X2) as in the regression of y on X1 and X2 . 
B. H. Baltagi, Solutions Manual for Econometrics, Springer Texts in Business and Economics, DOI 10.1007/9783642545481_7, © SpringerVerlag Berlin Heidelberg 2015
7.2 The FWL Theorem.
a. The inverse of a partitioned matrix A =
is given by
A"1 =
where B22 = (A22 — A21An1A1^ . From (7.9), we get
B22 = (x2X2 — X2X1 (x;x1)1 x;x^ – = (X2X2 — X2PX1X2)1
= (x2 PX1 x^ 1.
Also, —B22A21A"11 = — (X2PX1 X2p X2X1 (x;x^ Hence, from (7.9),
we solve for "2,ols to get
"2,ols = —B22A21A" X1y + B22X2y which yields
" 2,ols = — (X2PX1X2)1 x2Px1y c (X2PX1X2)1 X2y
= (x2Px, X2)1 x2Px1y
as required in (7.10).
b. Alternatively, one can write (7.9) as
(ХІХ1) " 1,ols C (ХІХ2) "2,ols = X1y
(X2X1) " 1,ols C (X2X2) "2,ols = X2y.
Solving for " 1ols in terms of "2 ols by multiplying the first equation by (X,1X1) 1 we get
" 1,ols = (X1X1)1 X1y — (X1X1)1 X1X2" 2,ols
= (X1X1)1 X1 (y — X2" 2,ols).
Substituting " i, ols in the second equation, we get
X2Xi (XiXi)_1 Xiy – X2PX1X2" 2,ols + (X2X2)" 2,ols = X2 y.
Collecting terms, we get (X2Px1 X2) "2,ols = X2Px1 y.
Hence, "2,ols = (X2PxiX2) 1 X2Pxiy as given in (7.10).
c. In this case, X = [tn, X2] where tn is a vector of ones of dimension n.
Px1 = tn (i^in) 1 С = 1п1П/п = Jn/n where Jn = ini; is a matrix of ones
n _
of dimension n. But i^y = YI Уі and i^y/n = y. Hence, PX1 = In — PX1 =
i=1
In — Jn/nand PX1 y = (In — Jn/n)y has a typical element (yi — y). From the FWL Theorem, "2,ols can be obtained from the regression of (yi — y) on the set of variables in X2 expressed as deviations from their respective means,
i. e.,Px1X2 = (In — Jn/n) X2.
From part (b),
" 1,ols = (X1X1)_1 X1 (y — X2"2,ols) = (l^Ln)"1 in (y — X2"2,ols)
= (y — X2"2,ols) = y — X2"2,ols
where X2 = inX2/n is the vector of sample means of the independent variables in X2.
7.3 Di = (0,0,.., 1,0,.., O)0 where all the elements of this nx1 vector are zeroes except for the ith element which takes the value 1. In this case, PDi = Di (DD^ 1 D0 = DiD0 which is a matrix of zeroes except for the ith diagonal element which takes the value 1. Hence, In — PDi is an identity matrix except for the ith diagonal element which takes the value zero. Therefore, (In — PDi )y returns the vector y except for the ith element which is zero. Using the FWL Theorem, the OLS regression
y = X" + Di" + u
yields the same estimates as (In — PDi )y = (In — PDi)X" + (In — PDi )u which can be rewritten as y = X" + u with y = (In — PDi)y, IX = (In — PDi)X.
The OLS normal equations yield (X’X)"ols = X0y and the ith OLS normal equation can be ignored since it gives О0(3ols = 0. Ignoring the ith observation equation yields (X*,X*)°ols = X*0y* where X* is the matrix X without the ith observation and y* is the vector y without the ith observation. The FWL Theorem also states that the residuals from y on X are the same as those from y on X and Di. For the ith observation, y; = 0 and xi = 0. Hence the ith residual must be zero. This also means that the ith residual in the original regression with the dummy variable D; is zero, i. e., yi — x3ols — "ols = 0. Rearranging terms, we get "ols = yi — x[(3ols. In other words, "ols is the forecasted OLS residual for the ith observation from the regression of y* on X* . The ith observation was excluded from the estimation of (3ols by the inclusion of the dummy variable Db
7.5 If u ~ N(0, o2In) then (n — K)s2/o2 ~ x2K. In this case,
a. E[(n — K)s2/o2] = E (x2^ = n — K since the expected value of a x2 random variable with (n — K) degrees of freedom is (n — K). Hence,
[(n — K)/o2] E(s2) = (n — K) or E(s2) = o2.
b. var[(n — K)s2/o2] = var (x2K) = 2(n — K) since the variance of a x2 random variable with (n — K) degrees of freedom is 2(n — K). Hence,
[(n — K)2/o4] var(s2) = 2(n — K) or var(s2) = 2o4/(n — K).
7.6 a. Using the results in problem 7.4, we know that 6m2le = e0e/n = (n—K)s2/n.
Hence,
E (6mle) = (n — K) E(s2)/n = (n — K)°2/n.
This means that 6^ is biased for o2, but asymptotically unbiased. The bias is equal to —Ko2/n which goes to zero as n! i.
b. var = (nK)2 var(s2)/n2 = (nK)22o4/n2(nK) = 2(nK)o4/n2 and
MSE (^e) = Bias2 (S^e) + var (S^e) = K2s4/n2 + 2(n – K)s4/n2 = (K2 + 2n – 2K) s4/n2.
c. Similarly, 52 = e0e/r = (n – K)s2/r with E(S2) = (n – K)s2/r and var(52) = (n – K)2 var(s2)/r2 = (n – K)22s4/r2(n – K) = 2(n – K)s4/r2 MSE(52) = Bias2(52) + var(52) = (n – K – r)V/r2 + 2(n – K)o4/r2.
Minimizing MSE(52) with respect to r yields the firstorder condition 9MSE (52) _ 2(n – K – r)s4r2 – 2r(n – K – r)V 4r(n – K)s4
@r r4
which yields
(n – K – r)r + (n – K – r)2 + 2(n – K) = 0 (n – K – r)(r + n – K – r) + 2(n – K) = 0 (n – K)(n – K – r + 2) = 0 since n > K, this is zero for r = n – K + 2. Hence, the Minimum MSE is obtained at 5 = e0e/(n – K + 2) with
MSE (52) = 4s4/(n – K + 2)2 + 2(n – K)o4/(n – K + 2)2
= 2s4(n – K + 2)/(n – K + 2)2 = 2s4/(n – K + 2).
Note that s2 = e0e/(n – K) with MSE(s2) = var(s2) = 2s4/(n – K) > MSE (52).
Also, it can be easily verified that MSE (5^^) = (K2 – 2K + 2n)o4/n2 > MSE (cr2) for 2 < K < n, with the equality holding for K = 2.
7.7 Computing Forecasts and Forecast Standard Errors Using a Regression Package. This is based on Salkever (1976). From (7.23) one gets
a. The OLS normal equations yield
" ols Y ols
or (X’X)Pols + (xox^0ols + xoYols = X0y + xoyo and Xo 3 ols + Yols = yo
From the second equation, it is obvious that Yols = yo — Xo0ols. Substituting this in the first equation yields
(X’X)P ols + (xoxo)p ols + xoyo — xoXoP ols = X0y + xoyo
Premultiplying (7.23) by Px, is equivalent to omitting the last To observations. The resulting regression is that of y on X which yields 0ols = (X, X)_1X, y as obtained above. Also, premultiplying by Px,, the last To observations yield zero residuals because the observations on both the
dependent and independent variables are zero. For this to be true in the original regression, we must have yo — Xo"ols — Yols = 0. This means that Yols = yo — Xo"ols as required.
b. The OLS residuals of (7.23) yield the usual least squares residuals
eols = y X" ols for the first n observations and zero residuals for the next To observations. This means that e*0 = (eols, 0′) and e*’e* = eolseols with the same residual sum of squares. The number of observations in (7.23) is n+To and the number of parameters estimated is K + To. Hence the new degrees of freedom in (7.23) is (n C To) — (K C To) = (n — K) = the old degrees of freedom in the regression of y on X. Hence, s*2 = e*’e*/(n — K) = eolseols/(n — K) = s2.
c. Using partitioned inverse formulas on (X*’X*) one gets
(X’X)1 —(X’X)1Xo
—Xo(X’X)1 ITo c Xo(X’X)1Xo.
This uses the fact that the inverse of
A _ ГА11 A12i • A_^ = B11 —B11A12A221
LA21 A2d L—A^21A21 B11 A1 C A1 A21B11A12 A1_
where B11 = (An — A12A21A21)1. Hence, s*2(X*’X*)1 = s2(X*’X*)1
and is given by (7.25).
d. If we replace yo by 0 and ITo by —ITo in (7.23), we get
y
0
or y* = X*8 C u*. Now X*0X* =
yield (X’X)°ols C (XoXo)Pols — xo"ols = X’y and — Xo°ols C "ols = 0.
From the second equation, it immediately follows that "ols = Xo3ols = yo the forecast of the To observations using the estimates from the first n
observations. Substituting this in the first equation yields (X’X)Pols C (XoXo) "ols – XoXo"ols = X0y
As in part (a), premultiplying by PX2 omits the last To observations and yields "ols based on the regression of y on X from the first n observations only. The last To observations yield zero residuals because the dependent and independent variables for these To observations have zero values. For this to be true in the original regression, it must be true that 0 — Xo"ols C уols = 0 which yields уols = Xo"ols = yo as expected. The residuals are still (eols, 00) and s*2 = s2 for the same reasons given in part (b). Also, using partitioned inverse as in part (c) above, we get ‘ (X0X)1 (x, x)1xo
Xo(X’X)1 Ito c Xo(x0x)1xo.
Hence, s*2(X*0X*) 1 = s2(X*0X*) 1 and the diagonal elements are as given in (7.25).
7.8 a. cov(pols, e) = E("ols — P)e0 = E[(X0X)1X0uu0Px] = o2(X0X)1X0Px = 0 where the second equality uses the fact that e = PX u and "ols = P C (X0X)1 X0u. The third equality uses the fact that E(uu0) = o2In and the last equality uses the fact that PXX = 0. But e ~ N(0, o2PX) and Pols ~ N(P, o2(X0X)1), therefore zero covariance and normality imply independence of "ols and e.
b. Pols — p = (X0X)1X0u is linear in u, and (n — K)s2 = e0e = u0PX u is quadratic in u. A linear and quadratic forms in normal random variables
u ~ N(0, o2In) are independent if (X0X)1X0PX = 0, see Graybill (1961), Theorem 4.17. This is true since PXX = 0.
7.9 a. Replacing R by c0 in (7.29) one gets (c0"ols — c0")0[c0(X0X)1c]1 (c0"ols — c0")/ct2. Since c0(X0X)1c is a scalar, this can be rewritten as
(c0" ols — c0")2/a2c0(X0X)1c
which is exactly the square of zobs in (7.26). Since zobs ~ N(0,1) under the null hypothesis, its square is xb under the null hypothesis.
b. Dividing the statistic given in part (a) by (n — K)s2/o2 ~ хПk divided by its degrees of freedom (nK) results in replacing o2 by s2, i. e.,
(c0" ols — c0")2/s2c0(X0X)1c.
This is the square of the tstatistic given in (7.27). But, the numerator is zobs ~ x? and the denominator is X^K^n—K). Hence, if the numerator and denominator are independent, the resulting statistic is distributed as F(1,n – K) under the null hypothesis.
7.10 a. The quadratic form u0Au/o2 in (7.30) has
A = X(X0X)1R0 [R(X0X)1R0]1 R(X0X)1X0.
This is symmetric, and idempotent since
A2 = X(X0X)1R0 [R(X0X)1R0]1 R(X0X)1X0X(X0X)1
R0 [R(X0X)1 R0]1 R(X0X)1X0
= X(X0X)1R0 [R(X0X)1R0]1 R(X0X)1X0 = A
and rank (A) = tr(A) = tr (R(X0X)1(X0X)(X0X)1 R0 [R(X0X)1R0]1) = tr(Ig) = g since R is gxK.
b. From lemma 1, u0Au/o2 ~ xg since A is symmetric and idempotent of rank g and u ~ N(0, o2In).
7.11 a. The two quadratic forms s2 = u, Pxu/(n — K) and u’Au/o2 given in (7.30) are independent if and only if PXA = 0, see Graybill (1961), Theorem4.10. This is true since PXX = 0.
b. (n — K)s2/o2 is x2k and u’Au/o2 ~ xg and both quadratic forms are independent of each other. Hence, dividing xg by g we get u’Au/go2. Also, хПk by (nK) we get s2/o2. Dividing u0Au/go2 by s2/o2 we getu0Au/gs2 which is another way of writing (7.31). This is distributed as F(g, nK) under the null hypothesis.
7.12 Restricted Least Squares
a. From (7.36), taking expected value we get
E(Prls) = E("ols) C (X0X)1R0 [R(X0X)1 R0]1 (rRE(Pols))
= P C (X0X)1R0 [R(X0X)1R0]1 (rR")
since E ^Pols^ = ". It is clear that "rls is in general biased for " unless r = R" is satisfied, in which case the second term above is zero.
b. var(Prls) = E[Prls — E(Prls)]["rls — E(Prls)]0
But from (7.36) and part (a), we have
Prls — E(Prl^ = (Pols — P) C (X0X)1R0 [R(X0X)1R^ 1 R(P — Pols)
using Pols — P = (X0X)1 X0u, one gets Prls — E(Prls) = A(X0X)1X0u where A = IK—(X0X)1R0 [R(X0X)1R0]1R. It is obvious that A is not symmetric, i. e., A ф A0. However, A2 = A, since
IK — (X0X)1R0 [R(X0X)1R0]1 R — (X0X)1R0 [R(X0X)1R0]1 R C (X0X)1 R0 [R(X0X)1R0]1 R(X0X)1R0 [R(X0X)1R0]1 R IK — (X0X)1R0 [R(X0X)1R0]1 R = A.
. uu X(XX) A I = a
a2 (X0X)1 – (X0X)1R0 [R(X0X)1R0]1 R(X0X)1
– (X0X)1R0 [R(X0X)1R0]1 R(X0X)1
+ (X0X)1R0 [R(X, X)“1R,]_1 R(X0X)1R0
[R(X, X)“1R,]_1 R(X0X)1
(X0X)1 – (X0X)1R0 [R(X, X)“1R,]_1 R(X0X)1
c. Using part (b) and the fact that var ols^ = a2(X0X)1 gives var ols^ —
rls^ = a2(X0X)1R0[R(X0X)1R0]1R(X0X)1 and this is positive
semidefinite, since R(X0X) 1R0 is positive definite.
7.13 The Chow Test
a. OLS on (7.47) yields
"1,ols 02,ols j
which is OLS on each equation in (7.46) separately.
b. The vector of OLS residuals for (7.47) can be written as e0 = (e^, e2) where e1 = y1 — X1" 1,ols and e2 = y2 — X2(32,ols are the vectors of OLS residuals from the two equations in (7.46) separately. Hence, the residual sum of squares = e0e = e01e1 + e2e2 = sum of the residual sum of squares from running yi on Xi for i = 1, 2.
c. From (7.47), one can write
the X matrix in (7.47) is related to that in (7.49) as follows: C
and the coefficients are therefore related as follows:
‘YM =C1fP1
"2 — P1 /
as required in (7.49). Hence, (7.49) yields the same URSS as (7.47). The RRSS sets (P2 — p1) = 0 which yields the same regression as (7.48).
7.14 a. The FWL Theorem states that p2,ols from PX1y = PX1X2"2 C PX1u will be identical to the estimate of "2 obtained from Eq. (7.8). Also, the residuals from both regressions will be identical. This means that the RSS from (7.8) given by y0PXy = y’y — y0PXy is identical to that from the above regression. The latter is similarly obtained as
y0PX1y — (X2PX1X2)1 X2PX1PX1y
= y0Px1y — y0Px1X2 (X2Px1X2)1X2Px1y
b. For testing Ho; "2 = 0, the RRSS = y0PX1y and the URSS is given in part (a). Hence, the numerator of the Chow Fstatistics given in (7.45) is given by
(RRSSURSS)/ k2 = y0Px1X2 ^Px^)1 X2?x1y/k2
Substituting y = Xi"i + u under the null hypothesis, yields u0Px1X2 (X2PX1X2)_1X2PX1u/k2 since PX1X1 = 0.
c. Let v = X2,PXlu. Given that u ~ N (0, o2In), then v is Normal with mean zero and var(v) = X2PXl var(u)PXlX2 = o2X2PX1X2 since PXl is idempo – tent. Hence, v ~ N (0, u2X2PXl X2). Therefore, the numerator of the Chow Fstatistic given in part (b) when divided by o2 can be written as v0[var(v)]1v/k2. This is distributed as xk2 divided by its degrees of freedom k2 under the null hypothesis. In fact, from part (b), A = PXlX2 (X2PXl X2) 1 X2PXl is symmetric and idempotent and of rank equal to its trace equal to k2. Hence, by lemma 1, u0Au/o2 is xk2 under the null hypothesis.
d. The numerator u0Au/k2 is independent of the denominator (n — k)s2/(n — k) = u0PXu/(n — k) provided PXA = 0 as seen in problem 7.11. This is true because PxPxi = Px (In — Pxi) = Px — PxXi (X1 X1)1 X1 = Px since PXX1 = 0. Hence,
PxA = PXPX1X2 (X2PX1X2)_1 X2PX1 = PxX2 (X2PX1X2)_1 X2PX1 = 0 since PXX2 = 0. Recall, PXX = PX[XbX2] = [PXX1,PXX2] = 0.
e. The Wald statistic for Ho; "2 = 0 given in (7.41) boils down to replacing R by [0, Ik2] and r by 0. Also, "mle by "ols from the unrestricted model given in (7.8) and o2 is replaced by its estimate s2 = URSS/(n — k) to make the Wald statistic feasible. This yields W = "2[R(X0X)_1R0]_1"2/s2. From problem 7.2, we showed that the partitioned inverse of X0X yields B22 = (X2PX1X^ 1 for its second diagonal (k2xk2) block. Hence,
Also, from problem 7.2, "2,ols = (X2Px1 X^ 1 X2Px1 y = (X2Px1 X^ 1 X2Px1 u after substituting y = X1"1 + u under the null hypothesis and using
PX1 Xi = 0. Hence,
s2W = u0PxiX2 (X2PxiX2)_1 (X2PxiX2) (X2PxiX2)_1 X2PxiU = u0PxiX2 (X2PxiX2)_1 X2PxiU
which is exactly k2 times the expression in part (b), i. e., the numerator of the Chow Fstatistic.
f. The restricted MLE of " is 1rls, 0^ since "2 = 0 under the null hypothesis. Hence the score form of the LM test given in (7.44) yields
(y – X1"ик)0 X(X, X)1X^y – X1"Uk) /о2.
In order to make this feasible, we replace o2 by s2 = RRSS/(n — k1) where RRSS is the restricted residual sum of squares from running y on X1. But this expression is exactly the regression sum of squares from running (y — X1" 1,rls)/s on the matrix X. In order to see this, the regression sum of squares of y on X is usually y0Pxy. Here, y is replaced by (y — X1" 1,rls)/s.
7.15 Iterative Estimation in Partitioned Regression Models. This is based on Baltagi (1996).
a. The least squares residuals of y on X1 are given by PX1 y, where PX1 = I — PX1 and PX1 = X1 (X1X^ 1 X1. Regressing these residuals on x2 yields b21) = (x2x2) 1 x2Px1 y. Substituting for y from (7.8) and using PX1X1 = 0 yields b21) = (x2x^1 x2Px1(x2^2 + u) with
E (b^) = (x2x2)1 x2Px1x2"2 = "2 — (x2x2)1 x2Px^2 = (1 — a)"
where a = (x2PX1 x2) / (x2x^ is a scalar, with 0 < a < 1. a ф 1 as long as x2 is linearly independent of X1. Therefore, the bias f b2^ ) = — a"2.
b. b(11) = (X’Xi) 1 X’ (y – x2b21)) = (X’X^ 1 X (I – Px2PxO y and
b22/ = (x2x2)_1 x2 (у – X(b(()) = (x2x2)_1 x2 [I – PX1 (I – Px2PX1)] y = (x2x2)_1 x2 [PX1 + Px1Px2PX1 ] y = (1 + a)b21).
Similarly,
b(2) = (XX)_1 X1 (y – x2b22^ = (XX)_1 X1 (y – (1 C a)x2b21)) b23) = (x2x2)_1 x2 (y – X1b12))
= (x2x2)_1 x2 y – PX1 (y – (1 + a)x2b21))
= b21) C (1 C a) (x2x2) 1 x2Px1x2b21) = (1 C a C a2)b21).
By induction, one can infer that
b22+1) = (1 C a C a2 C.. C aj)b21) for j = 0,1,2,…
Therefore,
E (b20+1^ = (1 C a C a2 C.. C aj)E
= (1 C a C a2 C.. C aj)(1 – a)^ = (1 – aj+1)"2
and the bias (b^P1^ = —aj+1"2 tends to zero as j! 1, since a <1.
c. Using the FrischWaughLovell Theorem, least squares on the original model yields
"2 = (x2pX1x2) 1 x2pX1y = (x2x2 – x2pX1x^_1 х2РХ1У
= (1 – a)_1 (x2x2)_1 x2pX1y = b21)/(1 – a).
Asj! i, lim b22+1) = aj^ b21) = b21)/(1 – a) = "2.
7.16 Maddala (1992, pp. 120127).
a. For Ho; " = 0, the RRSS is based on a regression of yi on a constant. This yields a = y and the RRSS = P (yi – y)2 = usual TSS. The URSS is the
i=1
usual least squares residual sum of squares based on estimating a and ". The
1 n logL(a, ", a2) = 2 log 2 л – 2 log a2 – — (yi – a – "X0 
loglikelihood in this case is given by
with the unrestricted MLE of a and " yielding ‘ols = y – "olsX and
nn
" ols = Xi – X Уі Xi – X2
2 n n logL amle,"mle, amle = 2 log2 л – 2 log 
and amle = URSS/n. In this case, the unrestricted loglikelihood yields
= 2 log 2 л – 2 – 2 log(URSS/n).
Similarly, the restricted MLE yields armle = y and "rmle = 0 and
n
= RRSS/n = £ (yi – y)2/n.
i=1
The restricted loglikelihood yields
– V log armle – 
/ 2 n n
log L ‘rmle, "rmle, armle = – log 2 Л – 2 log
= 2 log2 л – 2 – 2 log(RRSS/n). Hence, the LR test is given by
LR = n(log RRSSlogURSS) = n log(RRSS/URSS)
= nlog(TSS/RSS) = n log(1 /1 – r2)
where TSS and RSS are the total and residual sum of squares from the unrestricted regression. By definition R2 = 1 – (RSS/TSS) and for the simple regression rXY = R2 of that regression, see Chap. 3.
b. The Wald statistic for Ho; " = 0 is based upon r^"mle^ = ("mle – 0) and R mle) = 1 and from (7.40), we get W = "mle/var("mle) =
" 2ls/var(" ols).
This is the square of the usual tstatistic for " = 0 with 0mle used instead of s2 in estimating o2. Using the results in Chap. 3, we get
with omle = URSS/n = TSS(1 — R2)/n from the definition of R2. Hence,
11
using the definition of r2XY = r2 = R2 for the simple regression.
c. The LM statistic given in (7.43) is based upon LM = "2ls/or2mle ^P xi^. This is the tstatistic on " = 0 using 0r2mle as an estimate for o2. In this
n / n 2 n
case, o^ = RRSS/n = yi2/n. Hence, LM = n xiyi Уі2
i=1 i=1 i=1
n
P xi2 = nr2 from the definition of rXY = r2.
i=1
d. Note that from part (b), we get W/n = r2/(1 — r2) and 1 + (W/n) = 1/(1 — r2). Hence, from part (a), we have (LR/n) = log(1/1 — r2) = log[1 + (W/n)].
From part (c), we get (LM/n) = (W/n)/[1 + (W/n)]. Using the inequality x > log(1 + x) > x/(1 + x) with x = W/nweget W > LR > LM.
e. From Chap. 3, R2 of the regression of logC on logP is 0.2913 and n = 46. Hence, W = nr2/1 — r2 = (46)(0.2913)/(0.7087) = 18.91,
LM = nr2 = (46)(0.2913) = 13.399
and LR = n log(1 /1 — r2) = 46 log(1/0.7087) = 46 log(1.4110342) = 15.838. It is clear that W > LR > LM in this case.
7.17 Engle (1984, pp. 785786).
X)yt— 0E yt— 0T+0E yt
t=1 t=1
T
= £ (yt — 0)/0(1 — 0)
t=1
The MLE is given by setting S(0) = 0 giving 0mle = yt/T = y.
t=1
02(1 — 02) = T/0(1 — 0).
b. For testing Ho I 0 = 0o versus HaI 0 ф 0o, the Wald statistic given in (7.40) has r(0) = 0 — 0o and R(0) = 1 with I1 (^0mle^ = 0mle (1 — 0mle)/T. Hence,
W = T (0mle — 0o)2 j 0mle ( 1 — 0mle) = T(y — 0o)2/y(1 — y).
The LM statistic given in (7.42) has S(0o) = T(y — 0o)/0o(1 — 0o) and I1(0o) = 0o(1 — 0o)/T. Hence,
ил_ T2(y — 0o)2 0o(1 — 0Q^ T(y — 0o)2
[0o(1 — 0o)]2 T 0o(1 — 0o) .
The unrestricted loglikelihood is given by
T
logL (§mle) = logL(y) = [yt logy + (1 – yt) log(l – y)]
t=1
= Tytogy + T(1 – y) log(1 – y).
The restricted loglikelihood is given by
T
log L(0o) = ^2 [yt log 0o + (1 – yt) log(1 – 0o)]
t=1
= Tylog0o C T(1 – y) log(1 – 0o).
Hence, the likelihood ratio test gives
LR = 2Ty(logy – log 0o) C 2T(1 – y) [log(1 – y) – log(1 – 0o)]
= 2Tylog(y/0o) + 2T(1 – y) log [(1 – y)/(1 – 0o)] .
All three statistics have a limiting x2 distribution under Ho. Each statistic will reject when (y – 0o)2 is large. Hence, for finite sample exact results one can refer to the binomial distribution and compute exact critical values.
7.18 For the regression model
y = X" C u with u ~ N(0, ct2It).
a. L(", о2) = (1/2kct2)t/2 exp {(y – X")'(y – X")/2ct2} logL = —(T/2)(log 2 л C log о2) – (y – X")'(y – X")/2ct2 9 logL/9" = —(2X0y C 2X0X")/2о2 = 0
so that"mle = (X, X)_1X, yand 9 logL/9o2 = – T/2o2 C 0,0/2о4 = 0. Hence, &mle = u’u/T (where u = y – X"mle).
b. The score for " is S(") = 9 logL(")/9" = (X0y – X’XP)/ct2.
The Information matrix is given by
‘ 92 logL ‘ 
= – E 
" X0X о2 
X0y – X0X" ‘ о4 

, 9(", о2)9(", о2)0. 
X0y – X0X" 
T 
(y – X")0(y – X") 

– о4 
2о4 
о6 J 
I(", о2) = – E 
since E(X, y – X’X") = X, E(y – X") = 0 and E(y – XP)0(y – X") = To2
" X0X 0 ‘
I(P, o2) = JL
_ 0 2o4,
which is blockdiagonal and also given in (7.19).
c. The Wald statistic given in (7.41) needs
r(P) = "i – P?
R(P) = [Ik1,0]
W = (Pi – Pi)’
with (X0X and
a = (x’X1 – X1X2 (X2X2)1 x2*)1 = X? [i – X2 (x2x^ 1 X2j XiJ = [XIPX2X1]
by partitioned inverse. Therefore,
W = (p0 – P1)0 [X1PX2X1^ P1 – P1) / o2
as required. For the LR statistic LR = —2 (log L* – log L*) where L* is the
restricted likelihood and L* is the unrestricted likelihood. T T T (RRSSV
2 – 2 log2Л – 2 lo4— )
, T T T (URSS
(2 – 2log 2” – 2 log( —)
= T log(RRSS/URSS) = Tlog —
_u0
For the LM statistic, the score version is given in (7.42) as LM = S(p)011 (p) S (p)
where
The restriction on "i is ("i = "0), but there are no restrictions on "2. Note that"2 can be obtained from the regression of (y — Xi Pi) on X2. This yields
P 2 = (X2X2)"1 X2 (y — Xip?).
Therefore,
5 ^(p 2) = X2 (y — xp) = X2y — X2Xi"? — X2X2 p 2
= X2 y — X2XiP? — X2y + X2XiP? = 0.
Hence, LM = Si(p) I11 (") S? where In(P) is obtained from the partitioned inverse of I1("). Since S? = Xi (y — X"^ jo2 = X? u/52
and I_i(p) = 52(X0X)1 withI11 = 52 [X, iPX2Xi]_i we get
LM = u0Xi. [XiPX2X^"1 X? u/52
d. W = ("? — "i)’ [X’PX2Xi^P0 — "i) /52
= ("0 — Pi) [R (x’x)"1 r0] ("0 — "i) /52
= (r — R")0 R (X0X)_1 R0 1 (r — R") /52.
From (7.39) we know thatu0u—u0u = (r — R"^ [R(X0X) ?R0] 1 (r — R"^. Also, 52 = u0u/T. Therefore, W = T(u0u — u0u)/u0u as required.
LM = u0X? [X? P^] 1 X? u/5
From (7.43) we know that LM = (r — Rp)0[R(X0X)1R0]1(r — R")/52 and 52 = u0u/T. Using (7.39) we can rewrite this as LM = T (u0u — u0u)/u0u as required. Finally,
LR = T log (u0u/u0u) = T log Г1 + (u u~.uu M = T log(1 + W/T).
, u’u
Also, W/LM = — = 1 + W/T. Hence, LM u 0U
from (7.45)
Using the inequality x > log(1 + x) > x/(1 + x) with x = W/T we get (W/T) > log(1 + W/T) > (W/T)/(1 + W/T) or (W/T) > (LR/T) >
(LM/T) or W > LR > LM. However, it is important to note that all the statistics are monotonic functions of the Fstatistic and exact tests for each would produce identical critical regions. e. For the cigarette consumption data given in Table 3.2 the following test statistics were computed for Ho; " = —1
Wald = 1.16 > LR = 1.15 > LM = 1.13
and the SAS program that produces these results is given below. f. The Wald statistic for HA; " = —1 yields 1.16, for HB; "5 = —1 yields 0.43 and for HC; "5 = —1 yields 7.89. The SAS program that produces these results is given below.
SAS PROGRAM Data CIGARETT;
Input OBS STATE $LNCLNP LNY; Cards;
ProcIML; Use CIGARETT;
Read all into Temp;
N=NROW(TEMP); ONE=Repeat(1,N,1); Y=Temp[,2]; X=ONETemp[,3]Temp[,4];
BETA_U=INV(X’*X)*X’*yY;
R={0 1 0};
Ho=BETA_U[2,]+1;
BETA_R=BETA_U+INV(X’*X)*R’*INV(R*yINV(X’*X)*R’)*Ho;
E^U=YX*BETA_U;
ET_R=YX*BETA_R;
SIG_U=(ET_U’*ET_U)/N;
SIG_R=(ET_R’*ET_R)/N;
X1=X[,2];
X2=X[,1]X[,3];
Q_X2=I(N)X2*INV(X2’*X2)*X2′;
VAR_D=SIG_U*I NV(X1’*Q_X2*X1); WALD=Ho’*INV(VAR_D)*Ho;
LR=N*LOG(1+(WALD/N));
LM=(ET_R’*X1*INV(X1’*Q_X2*X1)*X1’*ET_R)/SIG_R;
*WALD=N*(ET_R’*ET_RET_U’*ET_U)/(ET_U’*ET_U);
*LR=N*Log(ET_R’*ET_R/(ET_U’*ET_U));
*LM=N*(ET_R’*ET_RET_U’*ET_U)/(ET_R’*ET_R);
PRINT ‘Chapter"7 Problem"18. (e)’,, WALD; PRINT LR;
PRINT LM;
BETA=BETA_U[2,];
H1=BETA+1;
H2=BETA**5+1;
H3=BETA**(5)+1;
VAR_D1=SIG_U*INV(X1’*Q_X2*X1);
VAR_D2=(5*BETA**4)*VAR_D1*(5*BETA**4);
VAR_D3=(5*BETA**(6))*VAR_D1*(5*BETA**(6));
WALD1=H1’*INV(VAR_D1)*H1;
WALD2=H2’*INV(VAR_D2)*H2;
WALD3=H3’*INV(VAR_D3)*H3;
PRINT ‘Chapter"7 Problem"18.(f)’,, WALD1; PRINT WALD2;
PRINT WALD3;
7.19 Gregory and Veall (1985).
a. For HA : "i – 1/^2 = 0, we have rA(") = "і – 1/" and "’ = ("o, "1, "2/. In this case, Ra(") = (0,1,1/"2) and the unrestricted MLE is OLS on (7.50) with variancecovariance matrix V("ols) = о2(X’X)1 where = URSS/n. Let v;j denote the corresponding elements of V("ols) for i, j = 0,1,2. Therefore,
wa = (p 1 – 1/P2 0,1,1/"2) V (Ц (0,1,1/"2)0 " 1 – 1/P2)
= (° 1°2 – ^ / (P2 v11 + 2v12 + v22/°2)
as required in (7.52). Similarly, for HB; "1"2 – 1 = 0, we have rB(") = In this case, RB(") = (0, "2, "1) and
WB = 1°2 – 1 0, P2, P1) V (Pols) (0, P2, P1 P 1p2 – 1)
= (P1P2 – 1^ j (P2v11 + 2P1 P2 v12 + P2 v22^
as required in (7.53).
7.20 Gregory and Veall (1986).
a. From (7.51), we get W = r((3ols)'[R(3ols)c2(X’X) 1R((3ols)’] ^(ІЗols). For HA; "1 p + "2 = 0, we have r(") = "1 p + "2 and R(") = ("1, p, 1) where "’ = (p, "1, "2). Hence,
(P1P + P2 P1, P, 1) o2(X’X)1 (j3 1, p, 1)’ p 1P + P2)
Where the typical element of the matrix are [yt1,xt, xt1]. For HB; "1 + ("2/p) = 0, we have r(") = "1 + ("2/p) and
R(") = – "2, 1,1 .Hence,
p2 p
WB = (" 1 C "2/p) ("l + iWp) •
_"2, 1, IWxW "2, 1,[2] 1 P[3] P P2 P, 
"2 1 
1, _ 2(x’x)4 1, _ "I2,1 v P? Pj V "2 P1, 
For HC; p C ("2/"1) = 0, we have
for HD, ("1 p/"2) C 1 = 0, we have
r(") = ("1 p/"2) C 1 and R(")
Hence,
SAS PROGRAM
Data CONSUMP;
Input YEAR YC; cards;
PROC IML; USE CONSUMP; READ ALL VAR {Y C};
Yt=Y[2:NROW(Y)];
YLAG=Y[1:NROW(Y)1];
Ct=C[2:NROW(C)];
CLAG=C[1:NROW(C)1];
X=CLAG  Yt  YLAG;
BETA=INV(X*X)*X’*Ct;
RH0=BETA[1];
BT1=BETA[2];
BT2=BETA[3];
Px=X*INV(X *X)*X ; Qx=I(NROW(X))Px; et_U=Qx*Ct;
SIG_U=SSQ(et_U)/NROW(X);
Ha=BT1*RHO+BT2;
Hb=BT1+BT2/RHO;
Hc=RHO+BT2/BT1;
Hd=BT1*RHO/BT2+1;
Ra=BT1  RHO  {1};
Rb=(BT2/RHO**2)  {1} (1/RHO);
Rc={1}  (BT2/BT1**2)  (1/BT1);
Rd=(BT1/BT2)  (RHO/BT2)  (BT1*RHO/BT2**2);
VAR_a=Ra*SIG_U*INV(X’ *X)*Ra’ ; VAR_b=Rb*SIG_U*INV(X’ *X)*Rb’ ;
VAR_c=Rc*SIG_U*INV(X’*X)*Rc’;
VAR_d=Rd*SIG_U*INV(X’*X)*Rd’;
WALD_a=Ha’*INV(VAR_a)*Ha;
WALD_b=Hb’*INV(VAR_b)*Hb;
WALD_c=Hc’*INV(VAR_c)*Hc;
WALD_d=Hd’*INV(VAR_d)*Hd;
PRINT ‘Chapter"7 Problem20. (b)’,,WALD_a;
PRINT WALD_b;
PRINT WALD_c;
PRINT WALD_d;
7.21 Effect of Additional Regressors on R2. For the regression equation y = X" + u the OLS residuals are given by e = y — X"ols = PXy where PX = In — PX, and Px = X(X0X)_1X0 is the projection matrix. Therefore, the SSE for this regression is e0e = y0PXy. In particular, SSE1 = y0PX1y, for X = X1 and SSE2 = y0PXy for X = (X1,X2). Therefore,
SSEi — SSE2 = y0 (Px1 — Px) y = y0 (Px — PX1) y = y0Ay
where A = PX — PX1 . This difference in the residual sums of squares is nonnegative for any vector y because y0Ay is positive semidefinite. The latter result holds because A is symmetric and idempotent. In fact, A is the difference between two idempotent matrices that also satisfy the following property: PxPx1 = Px1 Px = Px1 . Hence,
A2 = PX — PX1 — PX1 C PX1 = Px — PX1 = A.
R2 = 1 – (SSE/TSS) where TSS is the total sum of squares to be explained by the regression and this depends only on the y’s. TSS is fixed for both regressions. Hence R2 > R2, since SSE1 > SSE2.
References
Baltagi, B. H. (1996), “Iterative Estimation in Partitioned Regression Models,” Econometric Theory, Solutions 95.5.1, 12:869870.
Maddala, G. S. (1992), Introduction to Econometrics (Macmillan: New York).
Salkever, D. (1976), “The Use of Dummy Variables to Compute Predictions, Prediction Errors, and Confidence Intervals,” Journal of Econometrics, 4: 393397.
Leave a reply