# Appendix: Mastering Inference

YOUNG CAINE: I am puzzled.

master po: That is the beginning of wisdom.

Kung Fu, Season 2, Episode 25

This is the first of a number of appendices that fill in key econometric and statistical details. You can spend your life studying statistical inference; many masters do. Here we offer a brief sketch of essential ideas and basic statistical tools, enough to understand tables like those in this chapter.

The HIE is based on a sample of participants drawn (more or less) at random from the population eligible for the experiment. Drawing another sample from the same population, we’d get somewhat different results, but the general picture should be similar if the sample is large enough for the LLN to kick in. How can we decide whether statistical results constitute strong evidence or merely a lucky draw, unlikely to be replicated in repeated samples? How much sampling variance should we expect? The tools of formal statistical inference answer these questions. These tools work for all of the econometric strategies of concern to us. Quantifying sampling uncertainty is a necessary step in any empirical project and on the road to understanding statistical claims made by others. We explain the basic inference idea here in the context of HIE treatment effects.

We first quantify the uncertainty induced by random sampling, beginning with a single sample average, say, the average health of everyone in the sample at hand, as measured by a health index. Our target is the corresponding population average health index, that is, the mean over everyone in the population of interest. As we noted on p. 14, the population mean of a variable is called its mathematical expectation, or just expectation for short. For the expectation of a variable, Yf, we write E[Y]. Expectation is intimately related to formal

notions of probability. Expectations can be written as a weighted average of all possible values that the variable Yt can take on, with weights given by the probability these values

appear in the population. In our dice-throwing example, these weights are equal and given by 1/6 (see Section 1.1).

At this point, it’s helpful to switch from Avgn[Yt] to a more compact notation for averages, Y. Note that we’re dispensing with the subscript n to avoid clutter—henceforth, it’s on you to remember that sample averages are computed in a sample of a particular size. The sample average, Y, is a good estimator of E[Yt] (in statistics, an estimator is any function of sample data used to estimate parameters). For one thing, the LLN tells us that in large samples, the sample average is likely to be very close to the corresponding population mean. A related property is that the expectation of Y is also E[Y;]. In other words, if we were to draw infinitely many random samples, the average of the resulting Y across draws would be the underlying population mean. When a sample statistic has expectation equal to the corresponding population parameter, it’s said to be an unbiased estimator of that parameter. Here’s the sample mean’s unbiasedness property stated formally:

UNBIASEDNESS OF THE SAMPLE MEAN E[Y] = E[Y]

The sample mean should not be expected to be bang on the corresponding population mean: the sample average in one sample might be too big, while in other samples it will be too small. Unbiasedness tells us that these deviations are not systematically up or down; rather, in repeated samples they average out to zero. This unbiasedness property is distinct from the LLN, which says that the sample mean gets closer and closer to the population mean as the sample size grows. Unbiasedness of the sample mean holds for samples of any size.

The sample variance of Y{ in a sample of size n is defined as

л

SCW1 =;£»-if ■

г — 1

The corresponding population variance replaces averages with expectations, giving:

Like E[Y;], the quantity V(Y;) is a fixed feature of a population—a parameter. It’s therefore customary to christen it in Greek: =ar y which is read as “sigma-squared-y.”—

Because variances square the data they can be very large. Multiply a variable by 10 and its variance goes up by 100. Therefore, we often describe variability using the square root of the variance: this is called the standard deviation, written oY. Multiply a variable by 10 and its standard deviation increases by 10. As always, the population standard deviation, oY, has a sample counterpart S(Yf), the square root of S(Yf)2.

Variance is a descriptive fact about the distribution of Y{. (Reminder: the distribution of a variable is the set of values the variable takes on and the relative frequency that each value is observed in the population or generated by a random process.) Some variables take on a narrow set of values (like a dummy variable indicating families with health insurance), while others (like income) tend to be spread out with some very high values mixed in with many smaller ones.

The variance of a statistic like the sample mean is distinct from the variance used for descriptive purposes. We write V(Y for the variance of the sample mean, while V(Y;) (or

Gr) denotes the variance of the underlying data. Because the quantity V(Y) measures the variability of a sample statistic in repeated samples, as opposed to the dispersion of raw data, V(Y) has a special name: sampling variance.

Sampling variance is related to descriptive variance, but, unlike descriptive variance, sampling variance is also determined by sample size. We show this by simplifying the formula for V(Y). Start by substituting the formula for Y inside the notation for variance:

To simplify this expression, we first note that random sampling ensures the individual observations in a sample are not systematically related to one another; in other words, they are statistically independent. This important property allows us to take advantage of the fact that the variance of a sum of statistically independent observations, each drawn randomly from the same population, is the sum of their variances. Moreover, because each Yj is sampled from the same population, each draw has the same variance, &r. Finally, we use the property that the variance of a constant (like 1/n) times Y is the square of this

constant times the variance of Yi. From these considerations, we get

We’ve shown that the sampling variance of a sample average depends on the variance

2

of the underlying observations, y, and the sample size, n. As you might have guessed, more data means less dispersion of sample averages in repeated samples. In fact, when the

sample size is very large, there’s almost no dispersion at all, because when n is large, is small. This is the LLN at work: as n approaches infinity, the sample average approaches the population mean, and sampling variance disappears.

SE(?) = ^. (1.6)

Every estimate discussed in this book has an associated standard error. This includes sample means (for which the standard error formula appears in equation (1.6)), differences in sample means (discussed later in this appendix), regression coefficients (discussed in Chapter 2), and instrumental variables and other more sophisticated estimates. Formulas for standard errors can get complicated, but the idea remains simple. The standard error summarizes the variability in an estimate due to random sampling. Again, it’s important to avoid confusing standard errors with the standard deviations of the underlying variables; the two quantities are intimately related yet measure different things.

One last step on the road to standard errors: most population quantities, including the standard deviation in the numerator of (1.6), are unknown and must be estimated. In practice, therefore, when quantifying the sampling variance of a sample mean, we work with an estimated standard error. This is obtained by replacing oY with S(Y^ in the formula for SE(Y). Specifically, the estimated standard error of the sample mean can be written as

We often forget the qualifier “estimated” when discussing statistics and their standard errors, but that’s still what we have in mind. For example, the numbers in parentheses in Table 1.4 are estimated standard errors for the relevant differences in means.

## Leave a reply