# SELECTION OF REGRESSORS

In Section 10.2.3 we briefly discussed the problem of choosing between two bivariate regression equations with the same dependent variable. We stated that, other things being equal, it makes sense to choose the equa­tion with the higher Hr. Here, we consider choosing between two multiple regression equations

(12.5.1) у = XP + uj and

(12.5.2) у = Sy + u2,

where each equation satisfies the assumptions of model (12.1.3). Suppose the vectors P and 7 have К and H elements, respectively. If H Ф K, it no

longer makes sense to choose the equation with the higher R2, because the greater the number of regressors, the larger R2 tends to be. In the extreme case where the number of regressors equals the number of ob – servations, R = 1. So if we are to use R as a criterion for choosing a regression equation, we need to adjust it somehow for the degrees of freedom.

Theil (1961, p. 213) proposed one such adjustment. Theil’s corrected R2, denoted Rr, is defined by

(12.5.3) 1 – R2 = (1 – R2),

where К is the number of regressors. Theil proposed choosing the equa­tion with the largest Rr, other things being equal. Since, from (12.2.31),

(12.5.4)

У Ly

— о

choosing the equation with the largest R is equivalent to choosing the

2

equation with the smallest a, defined in (12.2.29).

— о n

Theil offers the following justification for his corrected R. Let dy and

9

cr2 be the unbiased estimators of the error variances in regression equa­tions (12.5.1) and (12.5.2), respectively. That is,

erf = у’ [I – X(X’X)-1X’]y /(T — K)

and

of =/[1- S(S’S)-1S’]y/(T-H).

Then, he shows that

(12.5.4) £(ct? – 5-І) > 0

if the expectation is taken assuming that (12.5.2) is the true model. The justification is merely intuitive and not very strong.

An important special case of the problem considered above is when S is a subset of X. Without loss of generality, assume X = (Xb X2) and S = Xj. Partition Э conformably as P’ = (РІ, p2). Then, choosing (12.5.2) over (12.5.1) is equivalent to accepting the hypothesis p2 = 0. But the F test of the hypothesis accepts it if ті < c, where iq is as given in (12.4.20) with p2 set equal to 0. Therefore, any decision rule can be made equivalent to the choice of a particular value of c. It can be shown that the use of Theil’s R is equivalent to c = 1.

Mallows (1964), Akaike (1973), and Sawa and Hiromatsu (1973) ob­tained solutions to this problem on the basis of three different principles and arrived at similar recommendations, in which the value of c ranges roughly from 1.8 to 2. These results suggest that Theil’s R2, though an

9

improvement over the unadjusted R, still tends to favor a regression equation with more regressors.

What value of c is implied by the customary choice of the 5% sig­nificance level? The answer depends on the degrees of freedom of the F test: К — H and T — K. Note that К — H appears as K2 in (12.4.20). Table

12.1 gives the value of c for selected values of the degrees of freedom. The table is calculated by solving for c in P[F(K — H, T — K) < c] =0.05. The results cast some doubt on the customary choice of 5%.

TABLE 12.1 Critical values of F test implied by 5% significance level

 К – H T – К c 1 30 0.465 3 30 0.807 1 100 0.458 5 100 0.867

EXERCISES

1. (Section 12.2.2)

Consider the regression model у = Xp + u, where Eu = 0, £uu’ = I4, and "111 Г 12 1—1

Let p = (X’X) !X’y and p = (S’X) ]S’y, where "1 1 1 Г 12 3 4

Show directly that P is a better estimator than P, without using Theorem 12.2.1.

2. (Section 12.2.2)

Consider the regression model у = (3x + u, where P is a scalar unknown parameter, x is a T-vector consisting entirely of ones, u is a T-vector such that Ей = 0 and Tuu’ = a2Ir. Obtain the mean squared errors of the following two estimators:

P = ¥ and P = ^,

XX Z X

where z’ = (1, 0, 1, 0, . . . , 1, 0). Assume that T is even. Which esti­mator is preferred? Answer directly, without using Theorem 12.2.1.

3. (Section 12.2.5)

Suppose the joint distribution of X and Y is given by the following table:

 T x 1 0 1 a p 0 0.5 – a 0.5 – p

(a) Derive an explicit formula for the maximum likelihood estimator of a based on i. i.d. sample {Xv Yt, і = 1, 2, . . . , n, and derive its asymptotic distribution directly, without using the Cramer-Rao lower bound.

(b) Derive the Cramer-Rao lower bound.

4. (Section 12.2.6)

In the model у = XP + u and yp = xj (3 + up, obtain the unconditional mean squared prediction errors of the predictors x^0 and xj^pf, where P = (X’X)_1X’y and P)+ = (XjXi^X^y. We have defined X! as the first Ki columns of X and x^ as the first elements of tl’p. Under what circumstances can the second predictor be regarded as superior to the first?

5. (Section 12.3)

Show that R defined in the paragraphs before (12.3.13) satisfies the two conditions given there.

6. (Section 12.4.2)

Consider the regression model

 T ‘ l У = Pi 1 + p2 -1 l l l -l

a

where Pj and p2 are scalar unknown parameters and u ~ N(0, ct I4). Assuming that the observed values of y’ are (2, 0, 1, —1), test the null hypothesis p2 = Pi against the alternative hypothesis P2 -> Pi at the 5% significance level.

7. (Section 12.4.3)

Consider the regression model у = X(3 + u, where у and u are eight-component vectors, X is an 8 X 3 matrix, and P is a three-com­ponent vector of unknown parameters. We want to test hypotheses on the elements of P, which we write as pb p2, and p3. The data are given by

 ‘2 0 O’ 4 XX = 0 3 1 , X’y = 5 .0 1 3. _3_

(a) Test p2 = Pi against p2 > pi at the 5% significance level.

(b) Test Pi = p2 = Рз at the 5% significance level.

8. (Section 12.4.3)

Consider three bivariate regression models, each consisting of four observations:

Уі = ОІ! І + PiXi + Uj,

y2 = a2l + p2x2 + u2,

Уз = «зі + РзХз + u3,

where 1 is a four-component vector consisting only of ones, and the elements of u1; u2, and u3 are independent normal with zero mean and constant variance. The data are as follows:

 T T r 0 1 0 *1 = 1 > x2 = 0 ’ Хз = 0 0 0 -1

 T T "o’ 1 0 1 7i = 0 ’ У2 = 1 ’ Уз = 0 0 0 1

Test the null hypothesis “aj = a2 = a3 and Pi = P2 = p3” at the 5% significance level.

9. (Section 12.4.3)

In the following regression model, test H0: a! + Pi = a2 + p2 and p2 = 0 versus H. not H0.

yi = ахХ1 + PiZi + Ui and y2 = a2x2 + p2z2 + u2,

where ui and u2 are independent of each other and distributed as 2V(0, ct2I5). Use the 5% significance level. The data are given as fol­lows:

 2′ ‘2’ і T 1 2 і -1 3 ’ У2 = 3 ’ Xj — X2 = і у Z — Z2 — 1 1 2 і -2 3 3 і 1

10. (Section 12.4.3)

Solve Exercise 35 of Chapter 9 in a regression framework.

11. (Section 12.4.3)

We want to estimate a Cobb-Douglas production function

log Qi = Pi + P2 log Kt + p3 log Lt + щ, t = 1, 2, . . . , T,

in each of three industries A, B, and C and test the hypothesis that p2 is the same for industries A and В and p3 is the same for industries В and C (joindy, not separately). We assume that Pi varies among the three industries. Write detailed instructions on how to perform such a test. You may assume that the ut are normal with mean zero and their variance is constant for all t and for all three industries, and that the Kt and Lt are distributed independendy of the ut.

The multiple regression model studied in Chapter 12 is by far the most frequendy used statistical model in all the applied disciplines, including econometrics. It is also the basic model from which various other models can be derived. For these reasons the model is sometimes called the classical regression model or the standard regression model. In this chapter we study various other models frequendy used in applied research. The mod­els discussed in Sections 13.1 through 13.4 may be properly called regression models (models in which the conditional mean of the dependent variable is specified as a function of the independent variables), whereas those discussed in Sections 13.5 through 13.7 are more general models. We have given them the common term “econometric models,” but all of them have been used by researchers in other disciplines as well.

The models of Section 13.1 arise as the assumption of independence’ or homoscedasticity (constant variance) is removed from the classical regres­sion model. The models of Sections 13.2 and 13.3 arise as the assumption of exogeneity of the regressors is removed. Finally, the models of Section

13.4 arise as the linearity assumption is removed. The models of Sections 13.5, 13.6, and 13.7 are more general than regression models.

Our presentation will focus on the fundamental results. For a more detailed study the reader is referred to Amemiya (1985).