What to Do?
In this section we address the question of what to do when harmful collinearity is found with respect to the important parameters in a regression model. This section is like a minefield. There is danger all around, and but two safe paths. We will identify the safe paths, though these are the roads less traveled, and we will mention some potentially dangerous and self-defeating methods for dealing with collinearity.
Since the collinearity problem is actually one of insufficient independent variation in the data, the first and most desirable solution is to obtain more and better data. If the new data possess the same dependencies found in the original sample, then they are unlikely to be of much help. On the other hand, if new data can be found in, as Belsley (1991, p. 297) calls it, "novel or underrepresented portions" of the sample space, then the new observations may mitigate the ill-effects of collinearity. Unfortunately, nonexperimental empirical researchers seldom have much if any control over the design of the data generation process, and hence this advice is, for the most part, empty of practical content. Should the occasion arise, however, Silvey (1969, p. 545) discusses the optimal choice, for the purpose of improving the estimation of a linear combination of the parameters c ‘P, of the values of the explanatory variables in a new observation. This classic treatment has been extended by Sengupta (1995).
Blanchard (1987, p. 449) says, "Only use of more economic theory in the form of additional restrictions may help alleviate the multicollinearity problem." We agree that the only "cure" for collinearity, apart from additional data, is additional information about regression parameters. However, restrictions can come from economic theory or previous empirical research, which we collectively term nonsample sources. If harmful collinearity is present, we are admitting that the sample data are inadequate for the purpose of precisely estimating some or all of the regression parameters. Thus the second safe strategy for mitigating the effects of collinearity is to introduce good nonsample information about the parameters into the estimation process. When nonsample information is added during the estimation process, estimator variances are reduced, which is exactly what we want to happen in the presence of collinearity (and indeed, all the time.) The downside to using nonsample information is that estimator bias is introduced.
It is possible that small amounts of bias are acceptable in return for significant increases in precision. The most commonly used measure of the bias/precision tradeoff is mean-square-error (MSE),
MSE(S) = E[(S – P)'(S – P)] = X var(S*) + X [E(Sк) – Pk]2, (12.12)
which combines estimator variances with squared biases. This measure is also known as estimator "risk" in decision theory literature (Judge et al., 1988, pp. 807-12).
Our general objective is to introduce nonsample information that improves upon the MSE of the OLS estimator. This is much easier said than done, and there is a huge literature devoted to methods for obtaining MSE improvement. See Judge and Bock (1978, 1983). Suffice it to say that MSE improvements occur only when the nonsample information we employ is good. How do we know if the information we introduce is good enough? We do not know, and can never know, if our nonsample information is good enough to ensure an MSE reduction, since that would require us to know the true parameter values. This is our conundrum. Below we briefly survey alternative methods for introducing nonsample information, all of which can be successful in reducing MSE, if the nonsample information is good enough.