# Collinearity-influential observations

One or two observations can make a world of difference in a data set, substantially improving, or worsening, the collinearity in the data. Can we find these "collinearity-influential" observations? If we do, what, if anything, do we do with them? The answer to the former question is "Maybe." The answer to the latter question is "It depends."

Influential-data diagnostics are designed to find "unusual" observations in a data set and evaluate their impact upon regression analysis. Standard references include BKW, Cook and Weisberg (1982) and Chatterjee and Hadi (1988). Mason and Gunst (1985) illustrate the effect that individual observations can have on data collinearity. Belsley (1991, pp. 245-70) reviews and illustrates diagnostics that may be useful for detecting collinearity-inducing observations, whose inclusion worsens collinearity in the data, and collinearity-breaking observations, whose inclusion lessens collinearity in the data. If к = (X1/XK)1/2 is the condition number of the X matrix, and if к^ denotes the condition index of the matrix X with row i (or set of rows) deleted, then one measure of the effect of an observation upon collinearity is

5(0 = K(i) K. (12.10)

K

A large negative value of 5Й indicates a collinearity-inducing observation, while a positive value indicates a collinearity-breaking observation. Chatterjee and Hadi (1988), Hadi and Wells (1990) and Sengupta and Bhimasankaram (1997) study this measure and variations of it. See Belsley (1991, p. 251) for examples.

The question is what to do when collinearity-influential observations are found? As with all influential, or unusual, observations we must first determine if they are correct. If they are incorrect, then they should be corrected. If they are correct, then the observations deserve close examination, in an effort to determine why they are unusual, and exactly what effect their inclusion, or exclusion, has upon the signs, magnitudes, and significance of the coefficient estimates.

A second consideration concerns estimator choice. When collinearity is present, and deemed harmful to the least squares estimator, alternative estimators designed to improve the precision of estimation are sometimes suggested. We will review some of these estimators in Section 5. If collinearity is induced by a few influential observations, then a robust estimator may be an alternative to consider.

## Leave a reply