# Covariates in the Heterogeneous-effects Model

You might be wondering where the covariates have gone. After all, covariates played a starring role in our earlier discussion of regression and matching. Yet the LATE theorem does not involve covariates. This stems from the fact that when we see instrumental variables as a type of (natural or man-made) randomized trial, covariates take a back seat. If, after all, the instrument is randomly assigned, it is likely to be independent of covariates. Not all instruments have this property, however. As with covariates in the regression models in the previous chapter, the main reason why covariates are included in causal analyses using instrumental variables is that the conditional independence and exclusion restrictions underlying IV estimation may be more likely to be valid after conditioning on covariates. Even randomly assigned instruments, like draft- eligibility status, may be valid only after conditioning on covariates. In the case of draft-eligibility, older cohorts were more likely to be draft-eligible because the cutoffs were higher. Because there are year-of-birth (or age) differences in earnings, draft-eligibility status is a valid instrument only after conditioning on year of birth.

More formally, IV estimation with covariates may be justified by a conditional independence assumption

{Yii, Yoi, Dii, Doi}n Zj|Xj (4.5.1)

In other words, we think of the instrumental variables as being “as good as randomly assigned,” conditional on covariates, Xj (here we are implicitly maintaining the exclusion restriction as well). A second reason for incorporating covariates is that conditioning on covariates may reduce some of the variability in the dependent variable. This leads to more precise 2SLS estimates under constant conditional effects.

The simplest causal model with covariates is the constant-effects model, with functional form restrictions as follows:

E[Yoi|Xj] = Xja* for a К x 1 vector of coefficients, a*;

Y lj — Y oj = p.

In combination with (4.5.1), this motivates 2SLS estimation of an equation like (4.1.6) as discussed in Section

4.1.

A straightforward generalization of the constant-effects model allows

Y1i Y0i — p(Xj);

where p(Xj) is a deterministic function of Xj. This model can be estimated by adding interactions between Zj and Xj to the first stage and (the same) interactions between Dj and Xj to the second stage. There are now multiple endogenous variables and hence multiple first-stage equations. These can be written

Dj = Xj^00 + ^01zj + ZjX j^02 + Coj

DjXj = Xj^10 + ^11zj + ZjXj^12 + Cij

The second stage equation in this case is

Yj = a’Xj + poD j + D jX jp1 + Vj;

so p(Xj) = po + p1Xj. Alternately, a nonparametric version of p(Xj) can be estimated by 2SLS in subsamples stratified on Xj.

The heterogeneous-effects model underlying the LATE theorem also allows for identification based on conditional independence as in (4.5.1), though the estimand is a little more complicated. For each value of

Xj, we define covariate – specific LATE,

— E[y1 i y0j Id 1 j > Doj, Xi].

The "saturate and weight” approach to estimation with covariates is spelled out in the following theorem (from Angrist and Imbens, 1995).

Theorem 4.5.1 SATURATE AND WEIGHT. Suppose the assumptions of the LATE theorem hold condi­tional on Xj. That is,

(CA1, Independence) {Yj(Dij, 1),Yoi(Doi,0),Dij, Doi}nZj|Xj;

(CA2, Exclusion) P[Yj(d, 0) =Yj(d, 1)|Xj] = 1 for d = 0, 1;

(CA3, First-stage), E[D1j— Doi|Xj] =0

We also assume monotonicity (A4) holds as before. Consider the 2SLS estimand based on the first stage equation

Di = xx + X1X zi + Cij (4.5.3)

and the second stage equation

Yj = ax + PcDj + Vj

where xx and ax denote saturated models for covariates (a full set of dummies for all values of Xj) and ^1x denotes a separate first-stage effect of Zj for every value of Xj. Then pc = E[w(Xj)A(Xj)] where  V{E[Dj|Xj, Zj]|Xj}

E[V{E[Dj|Xj, Zj]|Xj}]

E{P[Dj = 1|Xj, Zj](1 – P[Dj = 1|Xj, Zj])|Xj}
E[E[Dj|Xj, Zj](1 – P[Dj = 1|Xj, Zj])]

This theorem says that 2SLS with a fully saturated first stage and a saturated model for covariates in the second stage produces a weighted average of covariate-specific LATEs. The weights are proportional to the average conditional variance of the population first-stage fitted value, E[Dj|Xj, Zj], at each value of Xj. The theorem comes from he fact that the first stage coincides with E[Dj|Xj, Zj] when (4.5.3) is saturated (i. e., the first-stage regression recovers the CEF).

In practice, we may not want to work with a model with a first-stage parameter for each value of the covariates. First, there is the risk of bias, as we discuss at the end of this chapter, and second, a big pile of

individually-imprecise first-stage estimates is not pretty to look at. It seems reasonable to imagine that models with fewer parameters, say a restricted first stage imposing a constant w1x, nevertheless approximates some kind of covariate-averaged LATE. This turns out to be true, but the argument is surprisingly indirect. The vision of 2SLS as providing a MMSE error approximation to an underlying causal relation was developed by Abadie (2003). The Abadie approach begins by defining the object of interest to be E[Yj|Dj, Xj, Dij >Doi], the CEF for yi given treatment status and covariates, for compliers. An important feature of this CEF is that when the conditions of the LATE theorem hold conditional on Xi, it has a causal interpretation. In other words, for compliers, treatment-control contrasts conditional on Xi are equal to conditional-on-Xi LATEs:

– E [Yii – Yoi|Xi, Dii > Doi]

This follows immediately from the fact that, given (4.5.1), potential outcomes are independent of Di given Xi and Dii >Doi.32 The upshot is that we can imagine running a regression of Yi on Di and Xi in the complier population. Although this regression might not give us the CEF of interest (unless it is linear or the model is saturated), it will, as always, provide the MMSE approximation to it. So a regression of Yi on Di and Xi in the complier population approximates E[Yi|Di, Xi, D1i >Doi] just like OLS approximates E[Yi|Di, Xi]. Alas, we do not know who the compliers are, so we cannot sample them. Nevertheless, they can be found, in the following sense:   Theorem 4.5.2 ABADIE KAPPA. Suppose the assumptions of the LATE theorem hold conditional on covariates, Xi. Let g(Yi, Di, Xi) be any measurable function of (Yi, Di, Xi) with finite expectation. Define

Then E[Kig(yi, Di; Xi)]
E[Ki] 

This can be proved by direct calculation using the fact that, given the assumptions of the LATE the­orem, any expectation is a weighted average of means for always-takers, never-takers, and compliers. By monotonicity, those with Dj(1—Zj) = 1 are always-takers because they have Doj = 1, while those with (1—Dj)Zj = 1 are never-takers because they have Du = 0. Hence, the compliers are the left-out group.

The Abadie theorem has a number of important implications; for example, it crops up again in the discussion of quantile treatment effects. Here, we use it to approximate E[Yj|Dj, Xj, Dij >Doj] by linear regression. Specifically, let aa and solve

(aa, Pa) = argminE{(E[y;|d;, Xj, Dij > Doj] – aDj – Xjb)2|Dij > Doj}.

a. b

In other words, aaDj+Xj^a gives the MMSE approximation to E[Yj|Dj, Xj, D1j >Doj], or fits it exactly if it’s linear. A consequence of Abadie’s theorem is that this approximating function can be obtained by solving

(aa, Pa) = argminE{Kj(Yj – aDj – Xjb)2}, (4.5.5)

a, b

the kappa-weighted least-squares minimand.

Abadie proposes an estimation strategy (and develops distribution theory) for a procedure which involves first-step estimation of Kj using parametric or semiparametric models for the function, p(Xj) = P(Zj = 1 |Xj). The estimates from the first step are then plugged into the sample analog of (4.5.5) in the second step. Not surprisingly, when the only covariate is a constant, Abadie’s procedure simplifies to the Wald estimator. More surprisingly, minimization of (4.5.5) produces the traditional 2SLS estimator as long as a linear model is used for p(Xj) in the construction of Kj. In other words, if P(Zj = 1|Xj) =Xjn is used when constructing an estimate of Kj, the Abadie estimand is 2SLS. Thus, we can conclude that whenever p(Xj) can be fit or closely approximated by a linear model, it makes sense to view 2SLS as an approximation to the complier causal response function, E[Yj|Dj, Xj, D1j >Doj]. On the other hand, aa is not, in general, the 2SLS estimand and fia is not, in general, the vector of covariate effects produced by 2SLS. Still, the equivalence to 2SLS for linear P(Zj = 1 |Xj) leads us to think that Abadie’s method and 2SLS are likely to produce similar estimates in most applications, with the further implication that we can think of 2SLS as approximating E[yj|Dj, Xj, Dij >Doj].

The Angrist (2001) re-analysis of Angrist and Evans (1998) is an example where estimates based on (4.5.5) are indistinguishable from 2SLS estimates. Using twins instruments to estimate the effect of a third child on female labor supply generates a 2SLS estimate of -.088 (s. e.=.017), while the corresponding Abadie estimate is -.089 (s. e.=.017). Similarly, 2SLS and Abadie estimates of the effect on hours worked are identical at -3.55 (s. e.=.617). This is not a strike against Abadie’s procedure. Rather, it supports the notion, which we hold dear, that 2SLS approximates the causal relation of interest.