# Collinearity in the linear regression model

Denote the linear regression model as

y = Xp + e, (12.4)

where y is a T x 1 vector of observations on the dependent variable, X is a T x K non-stochastic matrix of observations on K explanatory variables, P is a K x 1 vector of unknown parameters, and e is the T x 1 vector of uncorrelated random errors, with zero means and constant variances, o2.

In the general linear model exact, or perfect, collinearity exists when the columns of X, denoted xi, i = 1,…, K, are linearly dependent. This occurs when there is at least one relation of the form a1x1 + a2x2 + … + aKxK = 0, where the ai are constants, not all equal to zero. In this case the column rank of X is less than K, the normal equations X’Xp = X’y do not have a unique solution, and least squares estimation breaks down. Unique best linear unbiased estimators do not exist for all K parameters. However, even in this most severe of cases, all is not lost. Consider equation (12.1), yt = p1 + p2xt2 + p3xt3 + et. Suppose that a2x2 + a3x3 = 0, or more simply, x2 = ax3. Substituting this into (12.1) we obtain yt = p1 + p2(ax3) + p3xt3 + et = p1 + (ap2 + P3)xt3 + et = p1 + yxt3 + et. Thus we can obtain a best linear unbiased estimator of у = ap2 + p3, a linear combination of the parameters. The classic paper by Silvey (1969) provides expressions for determining which linear combinations of parameters are estimable.

Exact collinearity is rare, and easily recognized. More frequently, one or more linear combinations of explanatory variables are nearly exact, so that a1x1 + a2x2 +… + aKxK ~ 0. We now examine the consequences of such near exact linear dependencies.