The Nature and Statistical Consequences of Collinearity
Consider first a linear regression model with two explanatory variables,
У t = P1 + P2X2 + P3X3 + e,
Assume that the errors are uncorrelated, with mean zero and constant variance, a2, and that xt2 and xt3 are nonstochastic. Under these assumptions the least squares estimators are the best, linear, unbiased estimators of the regression parameters. The variance of the least squares estimator b2 of p2 is
X(Xt2 – Z2)2(1 – r 23)
t=1 where z2 is the sample mean of the T observations on xt2, and r23 is the sample correlation between xt2 and xt3. The formula for the variance of b3, the least squares estimator of p3, is analogous, but the variance of the intercept estimator is messier and we will not discuss it here. The covariance between b2 and b3 is
The variance and covariance expressions reveal the consequences of two of the three forms of collinearity. First, suppose that xt2 exhibits little variation about its sample mean, so that X(xt2 – z2)2 is small. The less the variation in the explanatory variable xt2 about its mean, the larger will be the variance of b2, and the larger will be the covariance, in absolute value, between b2 and b3. Second, the larger the correlation between xt2 and xt3 the larger will be the variance of b2, and the larger will be the covariance, in absolute value, between b2 and b3. If the correlation is positive the covariance will be negative. This is the source of another conventional
observation about collinearity, namely that the coefficients of highly correlated variables tend to have opposite signs.
Exact, or perfect, collinearity occurs when the variation in an explanatory variable is zero, X(xt2 – x2)2 = 0, or when the correlation between xt2 and xt3 is perfect, so that r23 = ±1. In these cases the least squares estimates are not unique, and, in absence of additional information, best linear unbiased estimators are not available for all the regression parameters. Fortunately, this extreme case rarely occurs in practice.
The commonly cited symptoms of collinearity, that least squares estimates have the wrong sign, are sensitive to slight changes in the data or the model specification, or are not statistically significant, follow from the large variances of the least squares estimators. The least squares estimators are unbiased under standard assumptions, so that E[bk] = pk, but how close an estimate might be to the true parameter value is determined by the estimator variance. Large variances for estimators imply that their sampling (probability) distributions are wide, meaning that in any particular sample the estimates we obtain may be far from the true parameter values.