R-Squared Versus R-Bar-Squared
Since OLS minimizes the residual sums of squares, adding one or more variables to the regression cannot increase this residual sums of squares. After all, we are minimizing over a larger dimension parameter set and the minimum there is smaller or equal to that over a subset of the parameter space, see problem 4. Therefore, for the same dependent variable Y, adding more variables makes X)i=i e2 non-increasing and R2 non-decreasing, since R2 = 1 — i=i ei/J2i=i Vi).
Hence, a criteria of selecting a regression that “maximizes R2” does not make sense, since we can add more variables to this regression and improve on this R2 (or at worst leave it the same). In order to penalize the researcher for adding an extra variable, one computes
R2 = 1 — Ei=i e2/(n — K)]/[£"= 1 v2/(n — 1)] (4.15)
where n=i e2 and ^™=1 y2 have been adjusted by their degrees of freedom. Note that the numerator is the s2 of the regression and is equal to Y2i=i e2/(n — K). This differs from the s2 in Chapter 3 in the degrees of freedom. Here, it is n — K, because we have estimated K coefficients, or because (4.2) represents K relationships among the residuals. Therefore knowing (n — K) residuals we can deduce the other K residuals from (4.2). e2 is non-increasing as we add more variables, but the degrees of freedom decrease by one with every added variable. Therefore, s2 will decrease only if the effect of the e2 decrease outweighs the effect of the one degree of freedom loss on s2. This is exactly the idea behind R2, i. e., penalizing each added variable by decreasing the degrees of freedom by one. Hence, this variable will increase R2 only if the reduction in £™=i e2 outweighs this loss, i. e., only if s2 is decreased. Using the definition of R2, one can relate it to R2 as follows:
(1 — R2) = (1 — R2)[(n — 1) / (n — K)] (4.16)