# Appendix: Mastering Inference

YOUNG CAINE: I am puzzled.

master po: That is the beginning of wisdom.

Kung Fu, Season 2, Episode 25

This is the first of a number of appendices that fill in key econometric and statistical details. You can spend your life studying statistical inference; many masters do. Here we offer a brief sketch of essential ideas and basic statistical tools, enough to understand tables like those in this chapter.

The HIE is based on a sample of participants drawn (more or less) at random from the population eligible for the experiment. Drawing another sample from the same population, we’d get somewhat different results, but the general picture should be similar if the sample is large enough for the LLN to kick in. How can we decide whether statistical results constitute strong evidence or merely a lucky draw, unlikely to be replicated in repeated samples? How much sampling variance should we expect? The tools of formal statistical inference answer these questions. These tools work for all of the econometric strategies of concern to us. Quantifying sampling uncertainty is a necessary step in any empirical project and on the road to understanding statistical claims made by others. We explain the basic inference idea here in the context of HIE treatment effects.

The task at hand is the quantification of the uncertainty associated with a particular sample average and, especially, groups of averages and the differences among them. For example, we’d like to know if the large differences in health-care expenditure across HIE treatment groups can be discounted as a chance finding. The HIE samples were drawn from a much larger data set that we think of as covering the population of interest. The HIE population consists of all families eligible for the experiment (too young for Medicare and so on). Instead of studying the many millions of such families, a much smaller group of about 2,000 families (containing about 4,000 people) was selected at random and then randomly allocated to one of 14 plans or treatment groups. Note that there are two sorts of randomness at work here: the first pertains to the construction of the study sample and the second to how treatment was allocated to those who were sampled. Random sampling and random assignment are closely related but distinct ideas.

A World without Bias

We first quantify the uncertainty induced by random sampling, beginning with a single sample average, say, the average health of everyone in the sample at hand, as measured by a health index. Our target is the corresponding population average health index, that is, the mean over everyone in the population of interest. As we noted on p. 14, the population mean of a variable is called its mathematical expectation, or just expectation for short. For the expectation of a variable, Yf, we write E[Y]. Expectation is intimately related to formal

notions of probability. Expectations can be written as a weighted average of all possible values that the variable Yt can take on, with weights given by the probability these values

appear in the population. In our dice-throwing example, these weights are equal and given by 1/6 (see Section 1.1).

Unlike our notation for averages, the symbol for expectation does not reference the sample size. That’s because expectations are population quantities, defined without reference to a particular sample of individuals. For a given population, there is only one E[Y], while there are many Avgn[Yf], depending on how we choose n and just who ends up in our sample. Because E[Y] is a fixed feature of a particular population, we call it a parameter. Quantities that vary from one sample to another, such as the sample average, are called sample statistics.

At this point, it’s helpful to switch from Avgn[Yt] to a more compact notation for averages, Y. Note that we’re dispensing with the subscript n to avoid clutter—henceforth, it’s on you to remember that sample averages are computed in a sample of a particular size. The sample average, Y, is a good estimator of E[Yt] (in statistics, an estimator is any function of sample data used to estimate parameters). For one thing, the LLN tells us that in large samples, the sample average is likely to be very close to the corresponding population mean. A related property is that the expectation of Y is also E[Y;]. In other words, if we were to draw infinitely many random samples, the average of the resulting Y across draws would be the underlying population mean. When a sample statistic has expectation equal to the corresponding population parameter, it’s said to be an unbiased estimator of that parameter. Here’s the sample mean’s unbiasedness property stated formally:

UNBIASEDNESS OF THE SAMPLE MEAN E[Y] = E[Y]

The sample mean should not be expected to be bang on the corresponding population mean: the sample average in one sample might be too big, while in other samples it will be too small. Unbiasedness tells us that these deviations are not systematically up or down; rather, in repeated samples they average out to zero. This unbiasedness property is distinct from the LLN, which says that the sample mean gets closer and closer to the population mean as the sample size grows. Unbiasedness of the sample mean holds for samples of any size.

Measuring Variability

In addition to averages, we’re interested in variability. To gauge variability, it’s customary to look at average squared deviations from the mean, in which positive and negative gaps get equal weight. The resulting summary of variability is called variance.

The sample variance of Y{ in a sample of size n is defined as

л

SCW1 =;£»-if ■

г — 1

The corresponding population variance replaces averages with expectations, giving:

Because variances square the data they can be very large. Multiply a variable by 10 and its variance goes up by 100. Therefore, we often describe variability using the square root of the variance: this is called the standard deviation, written oY. Multiply a variable by 10 and its standard deviation increases by 10. As always, the population standard deviation, oY, has a sample counterpart S(Yf), the square root of S(Yf)2.

Variance is a descriptive fact about the distribution of Y{. (Reminder: the distribution of a variable is the set of values the variable takes on and the relative frequency that each value is observed in the population or generated by a random process.) Some variables take on a narrow set of values (like a dummy variable indicating families with health insurance), while others (like income) tend to be spread out with some very high values mixed in with many smaller ones.

It’s important to document the variability of the variables you’re working with. Our goal here, however, goes beyond this. We’re interested in quantifying the variance of the sample mean in repeated samples. Since the expectation of the sample mean is E[Yi](from the unbiasedness property), the population variance of the sample mean can be written as

The variance of a statistic like the sample mean is distinct from the variance used for descriptive purposes. We write V(Y for the variance of the sample mean, while V(Y;) (or

Gr) denotes the variance of the underlying data. Because the quantity V(Y) measures the variability of a sample statistic in repeated samples, as opposed to the dispersion of raw data, V(Y) has a special name: sampling variance.

Sampling variance is related to descriptive variance, but, unlike descriptive variance, sampling variance is also determined by sample size. We show this by simplifying the formula for V(Y). Start by substituting the formula for Y inside the notation for variance:

To simplify this expression, we first note that random sampling ensures the individual observations in a sample are not systematically related to one another; in other words, they are statistically independent. This important property allows us to take advantage of the fact that the variance of a sum of statistically independent observations, each drawn randomly from the same population, is the sum of their variances. Moreover, because each Yj is sampled from the same population, each draw has the same variance, &r. Finally, we use the property that the variance of a constant (like 1/n) times Y is the square of this

constant times the variance of Yi. From these considerations, we get

Simplifying further, we have

We’ve shown that the sampling variance of a sample average depends on the variance

2

of the underlying observations, y, and the sample size, n. As you might have guessed, more data means less dispersion of sample averages in repeated samples. In fact, when the

sample size is very large, there’s almost no dispersion at all, because when n is large, is small. This is the LLN at work: as n approaches infinity, the sample average approaches the population mean, and sampling variance disappears.

In practice, we often work with the standard deviation of the sample mean rather than its variance. The standard deviation of a statistic like the sample average is called its standard error. The standard error of the sample mean can be written as

SE(?) = ^. (1.6)

Every estimate discussed in this book has an associated standard error. This includes sample means (for which the standard error formula appears in equation (1.6)), differences in sample means (discussed later in this appendix), regression coefficients (discussed in Chapter 2), and instrumental variables and other more sophisticated estimates. Formulas for standard errors can get complicated, but the idea remains simple. The standard error summarizes the variability in an estimate due to random sampling. Again, it’s important to avoid confusing standard errors with the standard deviations of the underlying variables; the two quantities are intimately related yet measure different things.

One last step on the road to standard errors: most population quantities, including the standard deviation in the numerator of (1.6), are unknown and must be estimated. In practice, therefore, when quantifying the sampling variance of a sample mean, we work with an estimated standard error. This is obtained by replacing oY with S(Y^ in the formula for SE(Y). Specifically, the estimated standard error of the sample mean can be written as

We often forget the qualifier “estimated” when discussing statistics and their standard errors, but that’s still what we have in mind. For example, the numbers in parentheses in Table 1.4 are estimated standard errors for the relevant differences in means.