# Recent Developments in Regression Analysis

In this chapter we shall present three additional topics. They can be discussed in the framework of Model 1 but are grouped here in a separate chapter because they involve developments more recent than the results of the pre­vious chapter.

2.1 Selection of Regressors1

2.1.1 Introduction

Most of the discussion in Chapter 1 proceeded on the assumption that a given model (Model 1 with or without normality) is correct. This is the ideal situa­tion that would occur if econometricians could unambiguously answer ques­tions such as which independent variables should be included in the right – hand side of the regression equation; which transformation, if any, should be applied to the independent variables; and what assumptions should be im­posed on the distribution of the error terms on the basis of economic theory. In practice this ideal situation seldom occurs, and some aspects of the model specification remain doubtful in the minds of econometricians. Then they must not only estimate the parameters of a given model but also choose a model among many models.

We have already considered a particular type of the problem of model selection, namely, the problem of choosing between Model 1 without con­straint and Model 1 with the linear constraints Q’fi = c. The model selection of this type (selection among “nested” models), where one model is a special case of the other broader model, is the easiest to handle because the standard technique of hypothesis testing is precisely geared for handling this problem. We shall encounter many instances of this type of problem in later chapters. In Section 2.1, however, we face the more unorthodox problem of choosing between models or hypotheses, neither of which is contained in the other (“nonnested” models).

In particular, we shall study how to choose between the two competing regression equations

y = X1/?1 + u, (2.1.1)

and

y = X2& + u2, (2.1.2)

where X, is a T X Kx matrix of constants, X2 is a T X K2 matrix of constants, £ii, = Eu2 = 0, £u, u{ = al, and £u2u2 = <r§I. Note that this notation differs from that of Chapter 1 in that here there is no explicit connection between X, and X2: Xx and X2 may contain some common column vectors or they may not; one of X, and X2 may be completely contained in the other or they may be completely different matrices. Note also that the dependent variable у is the same for both equations. (Selection among more general models will be dis­cussed briefly in Section 4.5.)

This problem is quite common in econometrics, for econometricians often run several regressions, each of which purports to explain the same dependent variable, and then they choose the equation which satisfies them most accord­ing to some criterion. The choice is carried out through an intuitive and unsystematic thought process, as the analysts consider diverse factors such as a goodness of fit, reasonableness of the sign and magnitude of an estimated coefficient, and the value of a t statistic on each regression coefficient. Among these considerations, the degree of fit normally plays an important role, al­though the others should certainly not be ignored. Therefore, in the present study we shall focus our attention on the problem of finding an appropriate measure of the degree of fit. The multiple correlation coefficient R2, defined in (1.2.8), has an intuitive appeal and is a useful descriptive statistic; however, it has one obvious weakness, namely, that it attains its maximum of unity when one uses as many independent variables as there are observations (that is, when К = T). Much of what we do here may be regarded as a way to rectify that weakness by modifying R2.