IV and causality
We like to tell the IV story in two iterations, first in a restricted model with constant effects, then in a framework with unrestricted heterogeneous potential outcomes, in which case causal effects must also be heterogeneous. The introduction of heterogeneous effects enriches the interpretation of IV estimands, without changing the mechanics of the core statistical methods we are most likely to use in practice (typically, two – stage least squares). An initial focus on constant effects allows us to explain the mechanics of IV with a minimum of fuss.
To motivate the constant-effects setup as a framework for the causal link between schooling and wages, suppose, as before, that potential outcomes can be written
Y si = fi (s) ;
fi (s) = ^0 + KlS + Vi, (4.1.1)
as in the introduction to regression in Chapter 3. Also, as in the earlier discussion, imagine that there is a vector of control variables, Ai, called “ability”, that gives a selection-on-observables story:
Vi = A’il + Vi,
where 7 is again a vector of population regression coefficients, so that Vi and Ai are uncorrelated by construction. For now, the variables Ai, are assumed to be the only reason why Vi and Si are correlated, so that
E [SiVi] = 0.
In other words if Ai were observed, we would be happy to include it in the regression of wages on schooling; thereby producing a long regression that can be written
Equation (4.1.2) is a version of the linear causal model, (3.2.9). The error term in this equation is the random part of potential outcomes, Vi, left over after controlling for Ai. This error term is uncorrelated with schooling by assumption. If this assumption turns out to be correct, the population regression of Yi on Si and Ai produces the coefficients in (4.1.2).
The problem we initially want to tackle is how to estimate the long-regression coefficient, p, when Ai is unobserved. Instrumental variables methods can be used to accomplish this when the researcher has access to a variable (the instrument, which we’ll call Zi), that is correlated with the causal variable of interest, Si, but uncorrelated with any other determinants of the dependent variable. Here, the phrase "uncorrelated with any other determinants of the dependent variables" is like saying Cov(vi, Zi) = 0, or, equivalently, Zi is uncorrelated with both Ai and Vi. This statement is called an exclusion restriction since Zi can be said to be excluded from the causal model of interest. The exclusion restriction is a version of the conditional independence assumption of the previous chapter, except that now it is the instrument which is independent of potential outcomes, instead of schooling itself (the "conditional" in conditional independence enters into
the discussion when we consider IV models with covariates).
The second equality in (4.1.3) is useful because it’s usually easier to think in terms of regression coefficients than in terms of covariances. The coefficient of interest, p, is the ratio of the population regression of Yj on Zj (the reduced form) to the population regression of Sj on Zj (the first stage). The IV estimator is the sample analog of expression (4.1.3). Note that the IV estimand is predicated on the notion that the first stage is not zero, but this is something you can check in the data. As a rule, if the first stage is only marginally significantly different from zero, the resulting IV estimates are unlikely to be informative, a point we return to later.
It’s worth recapping the assumptions needed for the ratio of covariances in (4.1.3) to equal the casual effect, p. First, the instrument must have a clear effect on Sj. This is the first stage. Second, the only reason for the relationship between Yj and Zj is the first-stage. For the moment, we’re calling this second assumption the exclusion restriction, though as we’ll see in the discussion of models with heterogeneous effects, this assumption really has two parts: the first is the statement that the instrument is as good as randomly assigned (i. e., independent of potential outcomes, conditional on covariates), while the second is that the instrument has no effect on outcomes other than through the first-stage channel.
So where can you find an instrumental variable? Good instruments come from institutional knowledge and your ideas about the processes determining the variable of interest. For example, the economic model of education suggests that educational attainment is determined by comparing the costs and benefits of alternative choices. Thus, one possible source of instruments for schooling is differences in costs due, say, to loan policies or other subsidies that vary independently of ability or earnings potential. A second source of variation in schooling is institutional constraints. A set of institutional constraints relevant for schooling are compulsory schooling laws. Angrist and Krueger (1991) exploit the variation induced by compulsory schooling in a paper that typifies the use of “natural experiments” to try to eliminate omitted variables bias
The starting point for the Angrist and Krueger (1991) quarter-of-birth strategy is the observation that most states required students to enter school in the calendar year in which they turn 6. School start age is therefore a function of date of birth. Specifically, those born late in the year are young for their grade. In states with a December 31st birthday cutoff, children born in the fourth quarter enter school shortly before they turn 6, while those born in the first quarter enter school at around age 61. Furthermore, because compulsory schooling laws typically require students to remain in school only until their 16th birthday, these groups of students will be in different grades or through a given grade to different degree, when they reach the legal dropout age. In essence, the combination of school start age policies and compulsory schooling laws
creates a natural experiment in which children are compelled to attend school for different lengths of time depending on their birthdays.
Angrist and Krueger looked at the relationship between educational attainment and quarter of birth using US census data. Panel A of Figure 4.1.1 (adapted from Angrist and Krueger, 2001) displays the education-quarter-of-birth pattern for men in the 1980 Census who were born in the 1930s. The figure clearly shows that men born earlier in the calendar year tend to have lower average schooling levels. Panel A of Figure 4.1.1 is a graphical representation of the first-stage. The first-stage in a general IV framework is the regression of the causal variable of interest on covariates and the instrument(s). The plot summarizes this regression because average schooling by year and quarter of birth is what you get for fitted values from a regression of schooling on a full set of year-of-birth and quarter-of-birth dummies.
Panel B of Figure 4.1.1 displays average earnings by quarter of birth for the same sample used to construct panel A. This panel illustrates what econometricians call the “reduced form” relationship between the instruments and the dependent variable. The reduced form is the regression of the dependent variable on any covariates in the model and the instrument(s). Panel B shows that older cohorts tend to have higher earnings, because earnings rise with work experience. The figure also shows that men born in early quarters almost always earned less, on average, than those born later in the year, even after adjusting for year of birth, which plays the role of an exogenous covariate in the Angrist and Krueger (1991) setup. Importantly, this reduced-form relation parallels the quarter-of-birth pattern in schooling, suggesting the two patterns are closely related. Because an individual’s date of birth is probably unrelated to his or her innate ability, motivation, or family connections, it seems credible to assert that the only reason for the up-and-down quarter-of-birth pattern in earnings is indeed the up-and-down quarter-of-birth pattern in schooling. This is the critical assumption that drives the quarter-of-birth IV story.
A mathematical representation of the story told by Figure 4.1.1 comes from the first-stage and reduced – form regression equations, spelled out below:
Si — ХІЖ10 + ^11zi + Cli
yi — Хіж 20 + Ж21 zi + C2i
The parameter ж11 in equation (4.1.4a) captures the first-stage effect of zj on Si, adjusting for covariates,
A. Average Education by Quarter of Birth (first stage)
Xj. The parameter W21 in equation (4.1.4b) captures the reduced-form effect of zj on Yj, adjusting for these same covariates. In the language of the SEM, the dependent variables in these two equations are said to be the endogenous variables (where they are determined jointly within the system) while the variables on the right-hand side are said to be the exogenous variables (determined outside the system). The instruments, Zj, are a subset of the exogenous variables. The exogenous variables that are not instruments are said to be exogenous covariates. Although we’re not estimating a traditional supply and demand system in this case, these SEM variable labels are still widely used in empirical practice.
^21 _ Cov(Yj, zj) – K11 Cov(Sj, Zj)’
where Zj is the residual from a regression of Zj on the exogenous covariates, Xj. The right-hand side of (4.1.5) therefore swaps Zj for Zj in the general IV formula, (4.1.3). Econometricians call the sample analog of the left-hand side of equation (4.1.5) an Indirect Least Squares (ILS) estimator of p in the causal model with covariates,
where pj is the compound error term, Ajp + Vj5. It’s easy to use equation (4.1.6) to confirm directly that Cov(Yj, Zj) _ pCov(Sj, Zj) since Zj is uncorrelated with Xj by construction and with pj by assumption. In Angrist and Krueger (1991), the instrument, Zj, is quarter of birth (or dummies indicating quarters of birth) and the covariates are dummies for year of birth, state of birth, and race.