Estimation methods designed specifically for collinear data
A number of estimation methods have been developed to improve upon the least squares estimator when collinearity is present. We will briefly discuss two, ridge regression and principal components regression, if only to warn readers about their use.
The ridge family of estimators is
b(k) = (X’X + kI )-1X’y, (12.14)
where k is a suitably chosen constant. When k = 0 then the ridge estimator is just the OLS estimator of p. For nonstochastic values of k > 0 the ridge estimator is biased, but has smaller variances than the least squares estimator. It achieves the variance reduction by "shrinking" the least squares estimates towards zero. That is, the (Euclidean) length of the ridge estimator is smaller than that of the least squares estimator. Choosing k is important since some values result in reductions of overall mean square error and others do not. Unfortunately, picking a value of k that assures reduction in overall MSE requires knowledge of P and a2, the original object of the regression analysis. Numerous methods for selecting k based on the data have been proposed, but choosing k using data makes k random, and completely alters the statistical properties of the resulting "adaptive" ridge estimator (Hoerl, Kennard, and Baldwin, 1975; Lawless and Wang, 1976). Finite sample inference using the ridge estimator is hindered by dependence of its sampling distribution on unknown parameters. There is a huge statistics literature on the ridge estimator, but the fundamental problems remain and we cannot recommend this estimator.
Principal components regression (Fomby et al., 1984, pp. 298-300) is based upon eigenanalysis. Recall that the (K x K) matrix C, whose columns are the eigenvectors of X’X, is an orthogonal matrix, such that C’C = CC’ = I. The T x K matrix Z = XC is called the matrix of principal components of X. The ith column of Z, zj = Xcj, is called the ith principal component. From equation (12.5) zj has the property that zZ = Xj.
The "principal components" form of the linear regression model is
y = хв + e = XCC’P + e = Z0 + e, (12.15)
where Z = XC and 0 = C’p. The new set of explanatory variables Z are linear transformations of the original variables, and have the property that Z’Z = Л = diag(X 1, X2,…, XK), where the Xk are the ordered (in decreasing magnitude) eigenvalues of XX. If we apply least squares to the transformed model we obtain 0 = (Z’Z)-1Z’y, which has covariance matrix cov(0) = o2(Z’Z)-1 = о2Л-1, so that var(0k) = a2/Xk. If the data are collinear then one or more of the eigenvalues will be near zero. If XK ~ 0 then the eigenvector zK ~ 0, and consequently it is difficult to estimate 0K precisely, which is reflected in the large variance of its estimator, var(0K) = o2/XK. Principal components regression deletes from equation (12.15) the zk associated with small eigenvalues (usually based upon tests of significance, or some other model selection criterion, such as AIC or BIC). Partition the transformed model as y = Z0 + e = Z101 + Z202 + e. Dropping Z2, which contains the zk to be deleted, and applying OLS yields 01 = (Z1 Z1)-1Z1y. The principal components estimator of в is obtained by applying an inverse transformation
The properties of this estimator follow directly from the observation that it is equivalent to the RLS estimator of в obtained by imposing the constraints C, в = 0. Thus the principal components estimator bpc is biased, but has smaller variances than the OLS estimator. The data based constraints C, в = 0 generally have no economic content, and are likely to induce substantial bias. One positive use of principal components regression is as a benchmark. The J constraints C, в = 0 have the property that they provide the maximum variance reduction of any set of J linear constraints (Fomby, Hill, and Johnson, 1978). Thus researchers can measure the potential for variance reduction using linear constraints.