Forecast comparison and evaluation
The most reliable way to evaluate a forecast or to compare forecasting methods is by examining out of sample performance. To evaluate the forecasting performance of a single model or expert, one looks for signs of internal consistency. If the forecasts were made under squared error loss, the forecast errors should have mean zero and should be uncorrelated with any variable used to produce the forecast. For example, et+h/t should be uncorrelated with et/t-h, although et+h, t will in general have an MA(h – 1) correlation structure. Failure of out-of-sample forecasts to have mean zero and to be uncorrelated with Ft indicates a structural break, a deficiency of the forecasting model, or both.
Additional insights are obtained by comparing the out-of-sample forecasts of competing models or experts. Under mean squared error loss, the relative performance of two time series of point forecasts of the same variable can be compared by computing their mean squared forecast errors (MSFE). Of course, in a finite sample, a smaller MSFE might simply be an artifact of sampling error, so formal tests of whether the MSFEs are statistically significantly different are in order when comparing two forecasts. Such tests have been developed by Diebold and Mariano (1995) and further refined by West (1996), who built on earlier work by Nelson (1972), Fair (1980), and others.
Out of sample performance can be measured either by using true out-ofsample forecasts, or by a simulated out-of-sample forecasting exercise. While both approaches have similar objectives, the practical issues and interpretation of results is quite different. Because real time published forecasts usually involve expert opinion, a comparison of true out-of-sample forecasts typically entails an evaluation of both models and the expertise of those who use the models. Good examples of comparisons of real time forecasts, and of the lessons that can be drawn from such comparisons, are McNees (1990) and Zarnowitz and Braun
Simulated real time forecasting can be done in the course of model development and provides a useful check on the in-sample comparison measures discussed above. The essence of a simulated real time forecasting experiment is that all forecasts yt+h|t, t = T0,…, T1, are functions only of data up through date t, so that all parameter estimation, model selection, etc., is done only using data through date t. This is often referred to as a recursive methodology (for linear models, the simulated out-of-sample forecasts can be computed using a recursion). In general this entails many re-estimations of the model, which for nonlinear models can be computationally demanding. For an example of simulated out-of-sample forecast comparisons, see Stock and Watson (1999a).