A Measure of Fit
We have obtained the least squares estimates of a, в and a2 and found their distributions under normality of the disturbances. We have also learned how to test hypotheses regarding these parameters. Now we turn to a measure of fit for this estimated regression line. Recall, that ei = Yi — YPi where YPi denotes the predicted Yi from the least squares regression line at the value Xi, i. e., aoLS + POLSXi. Using the fact that £i=1 ei = 0, we deduce that £П= Yi = £П=1 Yi,
and therefore, Y = Y. The actual and predicted values of Y have the same sample mean, see numerical properties (i) and (iii) of the OLS estimators discussed in section 2. This is true
as long as there is a constant in the regression. Adding and subtracting Y from ti, we get ei = yi — yi, or yi = ei + y. Squaring and summing both sides:
£i=1 yf = £i=1 e2 + £!£ yf + 2 £!£ uy = £i=1 e? + £!£ yf (3.10)
where the last equality follows from the fact that yi = /3OLSxi and £™=1 ex = 0. In fact,
£i=1 ey = £n=i eiY = 0
means that the OLS residuals are uncorrelated with the predicted values from the regression, see numerical properties (ii) and (iv) of the OLS estimates discussed in section 3.2. In other words, (3.10) says that the total variation in Yi, around its sample mean Y i. e.^^™=1 yf, can be
decomposed into two parts: the first is the regression sums of squares ™=1 yf = eOLSZIi=1 xf, and the second is the residual sums of squares £i=1 ef. In fact, regressing Y on a constant yields aOLS = y, see problem 2, and the unexplained residual sums of squares of this naive model is
£1=1 (Yi — a ols )2 = £!£(£ — у)2 = £1=1 yf
Therefore, у2 in (3.10) gives the explanatory power of X after the constant is fit.
Using this decomposition, one can define the explanatory power of the regression as the ratio of the regression sums of squares to the total sums of squares. In other words, define R2 = £f=1 yf/Z7=1 yf and this value is clearly between 0 and 1. In fact, dividing (3.10) by £f=1 yf one gets R2 = 1 — Zi=1 ef/ZП=1 yf. The ef is a measure of misfit which was minimized by least squares. If £i= ef is large, this means that the regression is not explaining a lot of the variation in Y and hence, the Rf value would be small. Alternatively, if the ef is small, then the fit is good and R2 is large. In fact, for a perfect fit, where all the observations lie on the fitted line, Yi = Yyi and ei = 0, which means that ef = 0 and Rf = 1. The other extreme case is where the regression sums of squares 5^*= yf = 0. In other words, the linear regression explains nothing of the variation in Yi. In this case, £N=1 y2 = £f=1 ef and Rf = 0. Note that since Ei=1 у? = 0 implies yi = 0 for every i, which in turn means that Yyi = Y for every i. The fitted regression line is a horizontal line drawn at Y = У, and the independent variable X does not have any explanatory power in a linear relationship with Y.
Note that R2 has two alternative meanings: (i) It is the simple squared correlation coefficient between Yi and Yi, see problem 9. Also, for the simple regression case, (ii) it is the simple squared correlation between X and Y. This means that before one runs the regression of Y on X, one can compute r^y which in turn tells us the proportion of the variation in Y that will be explained by X. If this number is pretty low, we have a weak linear relationship between Y and X and we know that a poor fit will result if Y is regressed on X. It is worth emphasizing that R2 is a measure of linear association between Y and X. There could exist, for example, a perfect quadratic relationship between X and Y, yet the estimated least squares line through the data is a flat line implying that R2 = 0, see problem 3 of Chapter 2. One should also be suspicious of least squares regressions with R2 that are too close to 1. In some cases, we may not want to include a constant in the regression. In such cases, one should use an uncentered R2 as a measure fit. The appendix to this chapter defines both centered and uncentered R2 and explains the difference between them.