# Pairing Off

One sample average is the loneliest number that you’ll ever do. Luckily, we’re usually concerned with two. We’re especially keen to compare averages for subjects in experimental treatment and control groups. We reference these averages with a compact notation, writing Y1 for Avgn[YilDi = 1] and Y0 for Avgn[YilDi = 0]. The treatment group mean, Y1, is the average for the n1 observations belonging to the treatment group, with Y° defined similarly. The total sample size is n = n0 + n1.

We decide whether the evidence favors the hypothesis that л1 = Л0 by looking for statistically significant differences in the corresponding sample averages. Statistically significant results provide strong evidence of a treatment effect, while results that fall short of statistical significance are consistent with the notion that the observed difference in treatment and control means is a chance finding. The expression “chance finding” in this context means that in a hypothetical experiment involving very large samples—so large that any sampling variance is effectively eliminated—we’d find treatment and control means to be the same.

Statistical significance is determined by the appropriate f-statistic. A key ingredient in any f recipe is the standard error that lives downstairs in the f ratio. The standard error for a comparison of means is the square root of the sampling variance of Y1 – Y0. Using the fact that the variance of a difference between two statistically independent variables is the sum of their variances, we have

The second equality here uses equation (1.5). which gives the sampling variance of a single average. The standard error we need is therefore

In deriving this expression, we’ve assumed that the variances of individual observations are the same in treatment and control groups. This assumption allows us to use one

symbol, for the common variance. A slightly more complicated formula allows variances to differ across groups even if the means are the same (an idea taken up again in the discussion of robust regression standard errors in the appendix to Chapter 2).—

Recognizing that °Y must be estimated, in practice we work with the estimated standard error

where S(Y;) is the pooled sample standard deviation. This is the sample standard deviation calculated using data from both treatment and control groups combined.

Under the null hypothesis that p1 – p° is equal to the value p, the t-statistic for a difference in means is

Y] –

SE(y 1 – i70) ’

We use this t-statistic to test working hypotheses about p1 – p° and to construct confidence intervals for this difference. When the null hypothesis is one of equal means (p = 0), the statistic t(p) equals the difference in sample means divided by the estimated standard error of this difference. When the t-statistic is large enough to reject a difference of zero, we say the estimated difference is statistically significant. The confidence interval for a difference in means is the difference in sample means plus or minus two standard errors.

-1 For more on this surprising fact, see Jonathan Gruber, “Covering the Uninsured in the United States,” Journal of Economic Literature, vol. 46, no. 3, September 2008, pages 57–606.

2 Our sample is aged 26-59 and therefore does not yet qualify for Medicare.

4 Robert Frost’s insights notwithstanding, econometrics isn’t poetry. A modicum of mathematical notation allows us to describe and discuss subtle relationships precisely. We also use italics to introduce repeatedly used terms, such as potential outcomes, that have special meaning for masters of ’metrics.

5 Order the n observations on Y. so that the n° observations from the group indicated by D{ = 0 precede the n1 observations from the D{ = – group. The conditional average

I

Avgn[YiDl = 0]=^^Yi "o 1-І

is the sample average for the n° observations in the Dt = 0 group. The term Avgn[YDi = -] is calculated analogously from the remaining n1 observations.

7 Our description of the HIE follows Robert H. Brook et al., “Does Free Care Improve Adults’ Health? Results from a Randomized Controlled Trial,” New England Journal of Medicine, vol. 309, no. 23, December 8, 1983, pages 14261434. See also Aviva Aron-Dine, Liran Einav, and Amy Finkelstein, “The RAND Health Insurance Experiment, Three Decades Later,” Journal of Economic Perspectives, vol. 27, Winter 2013, pages 197-222, for a recent assessment.

8 Other HIE complications include the fact that instead of simply tossing a coin (or the computer equivalent), RAND investigators implemented a complex assignment scheme that potentially affects the statistical properties of the resulting analyses (for details, see Carl Morris, “A Finite Selection Model for Experimental Design of the Health Insurance Study,” Journal of Econometrics, vol. 11, no. 1, September 1979, pages 43-61). Intentions here were good, in that the experimenters hoped to insure themselves against chance deviation from perfect balance across treatment groups. Most HIE analysts ignore the resulting statistical complications, though many probably join us in regretting this attempt to gild the random assignment lily. A more serious problem arises from the large number of HIE subjects who dropped out of the experiment and the large differences in attrition rates across treatment groups (fewer left the free plan, for example). As noted by Aron-Dine, Einav, and Finkelstein, “The RAND Experiment,” Journal of Economic Perspectives, 2013, differential attrition may have compromised the experiment’s validity. Today’s “randomistas” do better on such nuts – and-bolts design issues (see, for example, the experiments described in Abhijit Banerjee and Esther Duflo, Poor Economics: A Radical Rethinking of the Way to Fight Global Poverty, Public Affairs, 2011).

9 The RAND results reported here are based on our own tabulations from the HIE public use file, as described in the Empirical Notes section at the end of the book. The original RAND results are summarized in Joseph P. Newhouse et al., Free for All? Lessons from the RAND Health Insurance Experiment, Harvard University Press, 1994.

10 Participants in the free plan had slightly better corrected vision than those in the other plans; see Brook et al., “Does Free Care Improve Health?” New England Journal of Medicine, 1983, for details.

11 See Amy Finkelstein et al., “The Oregon Health Insurance Experiment: Evidence from the First Year,” Quarterly Journal of Economics, vol. 127, no. 3, August 2012, pages 1057-1106; Katherine Baicker et al., “The Oregon Experiment—Effects of Medicaid on Clinical Outcomes,” New England Journal of Medicine, vol. 368, no. 18, May 2, 2013, pages 1713-1722; and Sarah Taubman et al., “Medicaid Increases Emergency Department Use: Evidence from Oregon’s Health Insurance Experiment,” Science, vol. 343, no. 6168, January 17, 2014, pages 263-268.

13 Lind’s experiment is described in Duncan P. Thomas, “Sailors, Scurvy, and Science,” Journal of the Royal Society of Medicine, vol. 90, no. 1, January 1997, pages 50-54.

14 Charles S. Peirce and Joseph Jastrow, “On Small Differences in Sensation,” Memoirs of the National Academy of Sciences, vol. 3, 1885, pages 75-83.

15 Ronald A. Fisher, Statistical Methods for Research Workers, Oliver and Boyd, 1925, and Ronald A. Fisher, The Design of Experiments, Oliver and Boyd, 1935.

i=i

that is, dividing by n – 1 instead of by n. This modified formula provides an unbiased estimate of the corresponding population variance.

– Using separate variances for treatment and control observations, we have

where VJ(Y.) is the variance of treated observations, and V°(Y.) is the variance of control observations.

Chapter 2

і:

Kung Fu, Season 1, Episode 8

When the path to random assignment is blocked, we look for alternate routes to causal

## Leave a reply