# Regression Standard Errors and Confidence Intervals

Our regression discussion has largely ignored the fact that our data come from samples. As we noted in the appendix to the first chapter, sample regression estimates, like sample means, are subject to sampling variance. Although we imagine the underlying relationship quantified by a regression to be fixed and nonrandom, we expect estimates of this relationship to change when computed in a new sample drawn from the same population. Suppose we’re after the relationship between the earnings of college graduates and the types of colleges they’ve attended. We’re unlikely to have data on the entire population of graduates. In practice, therefore, we work with samples drawn from the population of interest. (Even if we had a complete enumeration of the student population in one year, different students will have gone to school in other years.) The data set analyzed to produce the estimates in Tables 2.2-2.5 is one such sample. We would like to quantify the sampling variance associated with these estimates.

Just as with a sample mean, the sampling variance of a regression coefficient is
measured by its standard error. In the appendix to Chapter 1. we explained that the standard error of a sample average is

The standard error of the slope estimate in a bivariate regression (£) looks similar and can be written as

= ^ x —,

V" °x

where oe is the standard deviation of the regression residuals, and % is the standard deviation of the regressor, Xt.

Like the standard error of a sample average, regression standard errors decrease with sample size. Standard errors increase (that is, regression estimates are less precise) when the residual variance is large. This isn’t surprising, since a large residual variance means the regression line doesn’t fit very well. On the other hand, variability in regressors is good: as % increases, the slope estimate becomes more precise. This is illustrated in Figure 2.2. which shows how adding variability in Xt (specifically, adding the observations plotted in gray) helps pin down the slope linking Yt and Xt.

0

The regression anatomy formula for multiple regression carries over to standard errors. In a multivariate model like this,

к

Yi + X]

Jr-1

A

the standard error for the kth sample slope, ft, is

SE(ft) = -^x—, <2.15)

v* Щ

where Щ is the standard deviation of ft;, the residual from a regression of Xki on all other
regressors. The addition of controls has two opposing effects on SE(Pk). The residual variance (oe in the numerator of the standard error formula) falls when covariates that

predict Yt are added to the regression. On the other hand, the standard deviation of % in the denominator of the standard error formula is less than the standard deviation of Xki, increasing the standard error. Additional covariates explain some of the variation in other regressors, and this variation is removed by virtue of regression anatomy. The upshot of these changes to top and bottom can be either an increase or decrease in precision.

Standard errors computed using equation Г2.151 are nowadays considered old-fashioned and are not often seen in public. The old-fashioned formula is derived assuming the variance of residuals is unrelated to regressors—a scenario that masters call homoskedasticity. Homoskedastic residuals can make regression estimates a statistically efficient matchmaker. However, because the homoskedasticity assumption may not be satisfied, kids today rock a more complicated calculation known as robust standard errors.

The robust standard error formula can be written as

RSE<#)=yV^t‘*‘). (2.16)

Vn (cA )[1]

Robust standard errors allow for the possibility that the regression line fits more or less well for different values of X{, a scenario known as heteroskedasticity. If the residuals turn out to be homoskedastic after all, the robust numerator simplifies:

V(Xkiei)^V(Xl(i)V(ei)=(x1-a^

In this case, estimates of RSE(P) should be close to estimates of SE(P), since the theoretical standard errors are then identical. But if residuals are indeed heteroskedastic, estimates of RSE(P) usually provide a more accurate (and typically somewhat larger) measure of sampling variance.—

1 SAT scores here are from the pre-2005 SAT. Pre-2005 total scores add math and verbal scores, each of which range from 0 to 800, so the combined maximum is 1,600.

– Stacy Berg Dale and Alan B. Krueger, “Estimating the Payoff to Attending a More Selective College: An Application of Selection on Observables and Unobservables,” Quarterly Journal of Economics, vol. 117, no. 4, November 2002, pages 1491-1527.

– Which isn’t to say they are never fooled. Adam Wheeler faked his way into Harvard with doctored transcripts and board scores in 2007. His fakery notwithstanding, Adam managed to earn mostly As and Bs at Harvard before his scheme was uncovered (John R. Ellement and Tracy Jan, “Ex-Harvard Student Accused of Living a Lie,” The Boston Globe, May 18, 2010).

– When data fall into one of J groups, we need J – 1 dummies for a full description of the groups. The category for which no dummy is coded is called the reference group.

– “Ordinary-ness” here refers to the fact that OLS weights each observation in this sum of squares equally. We discuss weighted least squares estimation in Chapter 5.

– Our book, Mostly Harmless Econometrics (Princeton University Press, 2009), discusses regression-weighting schemes in more detail.

Competitive, and Noncompetitive, according to the class rank of enrolled students and the proportion of applicants admitted.

s Other controls in the empirical model include dummies for female students, student race, athletes, and a dummy for those who graduated in the top 10% of their high school class. These variables are not written out in equation (2.2).

– Dale and Krueger, “Estimating the Payoff to Attending a More Selective College,” Quarterly Journal of Economics,

2002.

– The group dummies in (2.4). 0., are read “theta-j.”

– This coefficient is read “lambda.”

– Joseph Altonji, Todd Elder, and Christopher Taber formalize the notion that the OYB associated with the regressors you have at hand provides a guide to the OYB generated by those you don’t. For details, see their study “Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools,” Journal of Political Economy, vol. 113, no. 1, February 2005, pages 151-184.

– Francis Galton, “Regression towards Mediocrity in Hereditary Stature,” Journal of the Anthropological Institute of Great Britain and Ireland, vol. 15, 1886, pages 246-263.

M George Udny Yule, “An Investigation into the Causes of Changes in Pauperism in England, Chiefly during the Last Two Intercensal Decades,” Journal of the Royal Statistical Society, vol. 62, no. 2, June 1899, pages 249-295.

– For a more detailed explanation, see Chapter 3 of Angrist and Pischke, Mostly Harmless Econometrics, 2009.

– The thing inside braces here, E[Y.|X] – E[ Y.X. – 1], is a function of X, and so, like the variable X, it has an expectation.

– The term “bivariate” comes from the fact that two variables are involved, one dependent, on the left-hand side, and one regressor, on the right. Multivariate regression models add regressors to this basic setup.

– The regression anatomy formula is derived similarly, hence we show the steps only for OVB.

– The percentage change interpretation of regression models built with logs does not require a link with potential outcomes, but it’s easier to explain in the context of models with such a link.

– The distinction between robust and old-fashioned standard errors for regression estimates parallels the distinction (noted in the appendix to Chapter 1) between standard error estimators for the difference in two means that use separate

or common estimates of °r for the variance of data from treatment and control groups.