Residual Interpretation of Multiple Regression Estimates
Although we did not derive an explicit solution for the OLS estimators of the P’s, we know that they are the solutions to (4.2) or (4.3). Let us focus on one of these estimators, say в2, the OLS estimator of в2, the partial derivative of Yi with respect to X2i. As a solution to (4.2) or (4.3), в2 is a multiple regression coefficient estimate of в2. Alternatively, we can interpret в2 as a simple linear regression coefficient.
Claim 1: (i) Run the regression of X2 on all the other X’s in (4.1), and obtain the residuals 32, i. e., X2 = X2 + 32. (ii) Run the simple regression of Y on 32, the resulting estimate of the slope coefficient is в2.
The first regression essentially cleans out the effect of the other X’ s from X2, leaving the variation unique to X2 in 32. Claim 1 states that в2 can be interpreted as a simple linear regression coefficient of Y on this residual. This is in line with the partial derivative interpretation of в2. The proof of claim 1 is given in the Appendix. Using the results of the simple regression given in (3.4) with the regressor Xi replaced by the residual 32, we get
З2 = £ ?=i ЗД/£ ™=1 ^2i and from (3.6) we get
varfo) = o2/ £™=i z?2i
An alternative interpretation of в2 as a simple regression coefficient is the following:
Claim 2: (i) Run Y on all the other X’s and get the predicted Y and the residuals, say 0. (ii) Run the simple linear regression of 0 on 32. в2 is the resulting estimate of the slope coefficient.
This regression cleans both Y and X2 from the effect of the other X’s and then regresses the cleaned out residuals of Y on those of X2. Once again this is in line with the partial derivative interpretation of в2. The proof of claim 2 is simple and is given in the Appendix.
These two interpretations of f32 are important in that they provide an easy way of looking at a multiple regression in the context of a simple linear regression. Also, it says that there is no need to clean the effects of one X from the other X’ s to find its unique effect on Y. All one has to do is to include all these X’s in the same multiple regression. Problem 1 verifies this result with an empirical example. This will also be proved using matrix algebra in Chapter 7.
Recall that R2 = 1 — RSS/TSS for any regression. Let R2 be the R2 for the regression of X2 on all the other X’s, then R2 = 1 — £™=1 З^/^Р™=1 x2i where x2i = X2i — X2 and
X2 = ££ 1 X2i/n; TSS = £™=i(X2i — X2)2 = £П=і x2i and RSS = £3£ Equivalently,
£i= 132i = £n=1 x2i(1 — r2) and the
var(32) = o2/ £™=132i = o2/ £™=1 x2i(1 — R2) (4.7)
This means that the larger R2, the smaller is (1 — R2) and the larger is var(e2) holding o2 and £n=1 x2i fixed. This shows the relationship between multicollinearity and the variance of the OLS estimates. High multicollinearity between X2 and the other X’s will result in high R2 which in turn implies high variance for в2. Perfect multicollinearity is the extreme case where R2 = 1. This in turn implies an infinite variance for в2. In general, high multicollinearity among the regressors yields imprecise estimates for these highly correlated variables. The least
squares regression estimates are still unbiased as long as assumptions 1 and 4 are satisfied, but these estimates are unreliable as reflected by their high variances. However, it is important to note that a low a2 and a high £П=хі could counteract the effect of a high R leading to a significant f-statistic for /32. Maddala (2001) argues that high intercorrelation among the explanatory variables are neither necessary nor sufficient to cause the multicollinearity problem. In practice, multicollinearity is sensitive to the addition or deletion of observations. More on this in Chapter 8. Looking at high intercorrelations among the explanatory variables is useful only as a complaint. It is more important to look at the standard errors and f-statistics to assess the seriousness of multicollinearity.
Much has been written on possible solutions to the multicollinearity problem, see Hill and Adkins (2001) for a good summary. Credible candidates include: (i) obtaining new and better data, but this is rarely available; (ii) introducing nonsample information about the model parameters based on previous empirical research or economic theory. The problem with the latter solution is that we never truly know whether the information we introduce is good enough to reduce estimator Mean Square Error.