# Model selection using information criteria

Because the object of point forecasting is to minimize expected loss out-of­sample, it is not desirable to minimize approximation error (bias) when this entails adding considerable parameter estimation uncertainty. Thus, for example, model selection based on minimizing the sum of squared residuals, or maxi­mizing the R2, can lead to small bias and good in-sample fit, but very poor out-of-sample forecast performance.

A formal way to make this tradeoff between approximation error and estima­tion error is to use information criteria to select among a few competing models. When й = 1, information criteria (IC) have the form,

IC(p) = In 6 2(p) + pg(T) (27.3)

where p is the dimension of 0, T is the sample size used for estimation, g(T) is a function of T with g(T) > 0 and Tg(T) ^ ^ and g(T) ^ 0 as T ^ ™, and 6 2(p) = SSR/T, where SSR is the sum of squared residuals from the (in-sample) estima­tion. Comparing two models using the information criterion (27.3) is the same as comparing two models by their sum of squared residuals, except that the model with more parameters receives a penalty. Under suitable conditions on this penalty and on the class of models being considered, it can be shown that a model selected by the information criterion is the best in the sense of the trade­off between approximation error and sampling uncertainty about 0. A precise statement of such conditions in AR models, when only the maximum order is known, can be found in Geweke and Meese (1981), and extensions to infinite order autoregressive models are discussed in Brockwell and Davis (1987) and, in the context of unit root tests, Ng and Perron (1995). The two most common information criteria are the Akaike information criterion (AIC), for which g(T) = 2/T, and Schwarz’s (1978) Bayes information criterion (BIC), for which g(T) = ln T/T.