# Prediction Criterion

A major reason why we would want to consider any measure of the goodness of fit is that the better fit the data have in the sample period, the better prediction we can expect to get for the prediction period. Therefore it makes sense to devise a measure of the goodness of fit that directly reflects the efficiency of prediction. For this purpose we shall compare the mean squared prediction errors of the predictors derived from the two competing equations (2.1.1) and (2.1.2). To evaluate the mean squared prediction error, however, we must know the true model; but, if we knew the true model, we would not have the problem of choosing a model. We get around this dilemma by evaluating the mean squared prediction error of the predictor derived from each model, assuming in turn that each model is the true model. This may be called the minimini principle (minimizing the minimum risk)—in contrast to the more standard minimax principle defined in Section 2.1.2—because the performance of each predictor is evaluated under the most favorable condi­tion for it. Although the principle is neither more nor less justifiable than the minimax principle, it is adopted here for mathematical convenience. Using the unconditional mean squared prediction error given in Eq. (1.6.11) after о2 is replaced with its unbiased estimator, we define the Prediction Criterion (abbreviated PC) by

PC,-= <7?(1 + T~lK), і =1,2, (2.1.13)

for each model, where д} = (Г — ЛТ()-1у’М(У is the number of regressors in model i.

If we define the modified R2, denoted by R2, as

l-R2 = j±ji(l-R2), (2.1.14)

choosing the equation with the smallest PC is equivalent to choosing the equation with the largest R2. A comparison of (2.1.6) and (2.1.14) shows that R2 imposes ahigher penalty upon increasing the number of regressors than does Theil’s R 2.

Criteria proposed by Mallows (1964) and by Akaike (1973) are similar to the PC, although they are derived from somewhat different principles. To define Mallows’Criterion (abbreviated MC), we must first define the matrix X as that matrix consisting of distinct column vectors contained m the union of Xt and X2 and К as the number of the column vectors of X thus defined. These definitions of X and К can be generalized to the case where there are more than two competing equations. Then we have

(2.1.15)

where M = I — X(X’X)~‘X’. Akaike (1973) proposed what he calls the Akaike Information Criterion (abbreviated AIC) for the purpose of distin­guishing between models more general than regression models (see Section 4.5.2). When the AIC is applied to regression models, it reduces to

(2.1.16)

All three criteria give similar results in common situations (see Amemiya, 1980a). The MC has one unattractive feature: the matrix X must be specified.