The Variance Decomposition of Belsley, Kuh, and Welsch (1980)
A property of eigenvalues is that tr( X’X) = XK=1 . This implies that the sizes of
the eigenvalues are determined in part by the scaling of the data. Data matrices
Begin by identifying large condition indices. A small eigenvalue and a near exact linear dependency among the columns of X is associated with each large condition index. BKW’s experiments lead them to the general guidelines that indices in the range 0-10 indicate weak near dependencies, 10-30 indicate moderately strong near dependencies, 30-100 a strong near dependency, and indices in excess of 100 are very strong. Thus when examining condition indexes values of 30 and higher should immediately attract attention.
Step 2 (if there is a single large condition index)
Examine the variance-decomposition proportions. If there is a single large condition index, indicating a single near dependency associated with one small eigenvalue, collinearity adversely affects estimation when two or more coefficients each have 50 percent or more of their variance associated with the large condition index, in the last row of Table 12.1. The variables involved in the near dependency have coefficients with large variance proportions.
Step 2 (if there are two or more large condition indexes of
RELATIVELY EQUAL MAGNITUDE)
If there are J > 2 large and roughly equal condition indexes, then X’ X has J eigenvalues that are near zero and J near exact linear dependencies among the columns of X exist. Since the J corresponding eigenvectors span the space containing the coefficients of the true linear dependence, the "50 percent rule" for identifying the variables involved in the near dependencies must be modified.
If there are two (or more) small eigenvalues, then we have two (or more) near exact linear relations, such as Xcj ~ 0 and Xcj ~ 0. These two relationships do not, necessarily, indicate the form of the linear dependencies, since X(a1ci + a2cj) ~ 0 as well. In this case the two vectors of constants ci and cf define a two-dimensional vector space in which the two near exact linear dependencies exist. While we may not be able to identify the individual relationships among the explanatory variables that are causing the collinearity, we can identify the variables that appear in the two (or more) relations.
Thus variance proportions in a single row do not identify specific linear dependencies, as they did when there was but one large condition number. In this case, sum the variance proportions across the J large condition number rows in Table 12.1. The variables involved in the (set of) near linear dependencies are identified by summed coefficient variance proportions of greater than 50 percent.
Step 2 (if there are J > 2 large condition indexes, with one
An extremely large condition index, arising from a very small eigenvalue, can "mask" the variables involved in other near exact linear dependencies. For example, if one condition index is 500 and another is 50, then there are two near exact linear dependencies among the columns of X. However, the variance decompositions associated with the condition index of 50 may not indicate that there are two or more variables involved in a relationship. Identify the variables involved in the set of near linear dependencies by summing the coefficient variance proportions in the last J rows of Table 12.1, and locating the sums greater than 50 percent.
Perhaps the most important step in the diagnostic process is determining which coefficients are not affected by collinearity. If there is a single large condition index, coefficients with variance proportions less than 50 percent in the last row of Table 12.1 are not adversely affected by the collinear relationship in the data. If there are J > 2 large condition indexes, then sum the last J rows of variance proportions. Coefficients with summed variance proportions of less than 50 percent are not adversely affected by the collinear relationships. If the parameters of interest have coefficients unaffected by collinearity, then small eigenvalues and large condition numbers are not a problem.
If key parameter estimates are adversely affected by collinearity, further diagnostic steps may be taken. If there is a single large condition index the variance proportions identify the variables involved in the near dependency. If there are multiple large condition indexes, auxiliary regressions may be used to further study the nature of the relationships between the columns of X. In these regressions one variable in a near dependency is regressed upon the other variables in the identified set. The usual t-statistics may be used as diagnostic tools to determine which variables are involved in specific linear dependencies. See Belsley (1991, p. 144) for suggestions. Unfortunately, these auxiliary regressions may also be confounded by collinearity, and thus they may not be informative.