Bounds on the parameters
Let us return to the bivariate regression model (no intercept, all variables having mean zero), written in scalar notation:
Vn = №n + є n (8.7a)
Xn = £ n + Vn, (8.7b)
where n denotes a typical element and the other notation is obvious. Assume for simplicity that в > 0 (the case with в < 0 is similar). OLS yields
plim (x’x)-1x’v = в (1 – oV/oX) = к < в, (8.8)
the well known bias towards zero. But it also holds that
plim (x’v)-Vv = в + 7T7 > P. (8.9)
The left-hand side of (8.9) is the probability limit of the inverse of the coefficient of the regression of x on у (the "reverse regression"). This regression also gives an inconsistent estimator of в, but with a bias away from zero. Thus, (8.8) and
(8.9) bound the true в from below and above, respectively. Since these bounds can be estimated consistently, by the regression and the reverse regression, we can take (x x)-1x V and (x V)-1V V as bounds between which в should lie in the limit. The bounds are obtained without making assumptions about the size of the measurement error.
These results on bounds without additional information carry over to the multiple regression case to a certain limited extent only: в lies anywhere in the convex hull of the elementary regression vectors if these are all positive, where the g + 1 elementary regression vectors are defined as the regression vectors of each of the g + 1 variables on the g other variables (scaled properly). This condition can be formulated slightly more generally by saying that it suffices that all regression vectors are in the same orthant since by changing signs of variables this can simply be translated into the previous condition (see Bekker, Wansbeek, and Kapteyn, 1985, for a discussion).
Reverse regression has drawn much attention in the context of the analysis of discrimination; see, e. g., Goldberger (1984a, 1984b). In its simplest form, the model is an extension of (8.7):
Vn = в^ + adn + є n
Xn = £ n + Vn,
where yn is wage, dn is a dummy indicating race or gender, and En is productivity. This variable can only be measured imperfectly through the indicator xn. The last equation in this model reflects the different average level of productivity between race or gender groups. The crucial parameter is a, since a nonzero value (i. e. a wage differential even after controlling for productivity) may be interpreted as a sign of discrimination. Regressing y on x and d can be shown to give an overestimate of a. Reverse regression, i. e. regressing x on y and d, is a useful technique here, since it can be shown to give an underestimate of a. The primary research question then is whether the two estimates have the same sign.