# Average Causal Response with Variable Treatment Intensity*

An important difference between the causal effects of a dummy variable and a variable that takes on the values {0, 1, 2, . . .} is that in the first case, there is only one causal effect for any one person, while in the latter there are many: the effect of going from 0 to 1, the effect of going from 1 to 2, and so on. The potential-outcomes notation we used for schooling recognizes this. Here it is again: let

Ysl = fi(s),

denote the potential (or latent) earnings that person i would receive after obtaining s years of education. Note that the function fi(s) has an “i” subscript on it while s does not. The function f(s) tells us what i would earn for any value of schooling, s, and not just for the realized value, Si. In other words, fi(s) answers causal “what if’ questions for multinomial Si.

Suppose that Si takes on values in the set {0, 1,…, s}. Then there are s unit causal effects, Ysi — Ys_1;i. A linear causal model assumes these are the same for all s and for all i, obviously unrealistic assumptions. But we need not take these assumptions literally. Rather, 2SLS provides a computational device that generates a weighted average of unit causal effects, with a weighting function we can estimate and study, so as to learn where the action is coming from with a particular instrument. This weighting function tells us how the compliers are distributed over the range of Si. It tells us, for example, that the returns to schooling estimated using quarter of birth or compulsory schooling laws come from shifts in the distribution of high school grades. Other instruments, like the distance instruments used by Card (1995), act elsewhere on the schooling distribution and therefore capture a different sort of return.

To flesh this out, assume that a single binary instrument, Zi, a dummy for having been born in a state with restrictive compulsory school laws, is to be used to estimate the returns to schooling (as in Acemoglu and Angrist, 2000). Also, let s1i denote the schooling i would get if zi = 1, and let soi denote the schooling i would get if zi = 0. The theorem below, from Angrist and Imbens (1995), offers an interpretation of the Wald estimand with variable treatment intensity in this case. Note that here we combine the independence and exclusion restrictions by simply stating that potential outcomes indexed by s are independent of the instruments.

Theorem 4.5.3 AVERAGE CAUSAL RESPONSE. Suppose

(ACR1, Independence and Exclusion) {Yoi, Yii,…, Ygi’, soysngDZi; (ACR2, First-stage), E[s1i — soi] = 0

(ACR3, Monotonicity) s1i — soi > 0Vi, or vice versa; assume the first

Then

S

^ ‘ !sE[Ysi Ys — 1,i |s1i > s > s0i]

s = 1 where

P[s1i > s > soi]

PS = 1 P [s1i > j > s0i]

The weights! s are non-negative and sum to one.

The average causal response (ACR) theorem says that the Wald estimator with variable treatment intensity is a weighted average of the unit causal response along the length of the potentially nonlinear causal relation described by fi(s). The unit causal response, E[YSi _ Ys_1i|s1i > s > soi], is the average difference in potential outcomes for compilers at point s, i. e., individuals driven by the instrument from a treatment intensity less than s to at least s. For example, the quarter of birth instruments used by Angrist and Krueger (1991) push some people from 11th grade to finishing 12th or higher, and others from 10th grade to finishing 11th or higher. The Wald estimator using quarter of birth instruments combines all of these effects into a single average causal response.

The relative size of the group of compliers at point s is P[s1i > s > soi]. By monotonicity, this must be non-negative and is given by the difference in the CDF of Si at point s. To see this, note that

P [s1i > s > soi] = P [s1i > s] – P [soi > s]

= P [soi < s] – P [s1i < s] ,

which is non-negative since monotonicity requires s1i > soi. Moreover,

P[soi < s] – P[s1i < s] = P[Si < s|Zi =0] – P[Si < s|Zi = 1]

by Independence. Finally, note that because the mean of a non-negative random variable is one minus the CDF, we have,

E [si |zi = 1] – E [Si |Zi =0]

= X)(P [Si < j|Zi = 1] – P [Si <j|Zi j=1

Thus, the ACR weighting function can be consistently estimated by comparing the CDFs of the endogenous variables (treatment intensity) with the instrument switched off and on. The weighting function is normalized

by the first-stage.

The ACR theorem helps us understand what we are learning from a 2SLS estimate. For example, instrumental variables derived from compulsory attendance and child labor laws capture the causal effect of increases in schooling in the 6-12 grade range, but not from post-secondary schooling. This is illustrated in Figure 4.5.1, taken from Acemoglu and Angrist (2000).

The figure plots differences in the probability that educational attainment is at or exceeds the grade level on the X-axis (i. e., one minus the CDF). The differences are between men exposed to different child labor laws and compulsory schooling laws in the a sample of white men aged 40-49 drawn from the 1960, 1970, and 1980 censuses. The instruments are coded as the number of years of schooling required either to work (Panel A) or leave school (Panel B) in the year the respondent was aged 14. Men exposed to the least restrictive laws are the reference group. Each instrument (e. g., a dummy for 7 years of schooling required before work is allowed) can be used to construct a Wald estimator by making comparisons with the reference group.

Panel A of Figure 4.5.1 shows that men exposed to more restrictive child labor laws were 1-6 percentage points more likely to complete grades 8-12. The intensity of the shift depends on whether the laws required 7, 8, or 9-plus years of schooling before work was allowed. But in all cases, the CDF differences decline at lower grades, and drop off sharply after grade 12. Panel B shows a similar pattern for compulsory attendance laws, though the effects are a little smaller and the action here is at somewhat higher grades, consistent with the fact that compulsory attendance laws are typically binding in higher grades than child labor laws.

Before wrapping up our discussion of LATE generalizations, it’s worth noting that most of the elements we have covered work in combination. For example, models with multiple instruments and variable treatment intensity generate a weighted average of the ACR for each instrument. Likewise, the saturate and weight theorem applies to models with variable treatment intensity. On the other hand, we do not yet have an extension of Abadie’s Kappa for models with variable treatment intensity. A final important extension is to the scenario where the causal variable of interest is continuous and we can therefore think of the causal response function as having derivatives.

So Long and Thanks for all the Fish

Suppose that as with the schooling problem, we imagine counterfactuals as being generated by an underlying functional relation. In this case, however, the causal variable of interest can take on any non-negative value and the functional relation is assumed to have a derivative. An example where this makes sense is a demand curve, the quantity demanded as a function of price. In particular, let qi(p) denote the quantity demanded in market i at hypothetical price p. This is a potential outcome, like fi(s), except that instead of individuals the unit of observation is a time or a location or both. For example, Angrist, Graddy, and Imbens (2000) estimate the elasticity of quantity demanded at the Fulton wholesale fish market in New York City. The

Schooling required to work 7 years – 8 years 9 years |

Required years of attendance 9 years 10 years 11 years |

Figure 4.5.1: The effect of compulsory schooling instruments on the probability of schooling (from Acemoglu and Angrist 2000). The figures show the difference in the probability of schooling at or exceeding the grade level on the x-axis. The reference group is 6 or fewer years of required schooling in the top panel, and 8 or fewer years in the bottom panel. The top panel shows the CDF difference by severity of child labor laws. The bottom panel shows the CDF difference by severity of compulsory attendace laws.

slope of this demand curve is q'(p); if quantity and price are measured in logs, this is an elasticity.

The instruments in Angrist, Graddy, and Imbens (2000) are derived from data on weather conditions off the coast of Long Island, not too far from major commercial fishing grounds. Stormy weather makes it hard to catch fish, driving up the price, and reducing quantity demanded. Angrist, Graddy, and Imbens use dummy variables such as stormyi, a dummy indicating periods with high wind and waves to estimate the demand for fish. The data consist of daily observations on wholesale purchases of Whiting, a cheap fish used for fish cakes and things like that.

The Wald estimator using the stormyi instrument can be represented as

E[qistormyi = 1] — E[qistormyi = 0]

E [pijstormyi = 1] — E[pistormyi = 0]

= f E[qi(t) pu > t> poi]P[pii > t> p0i]dt

f P[pii > t> poi]dt

where pi is the price in market (day) i and p1i and poi are potential prices indexed by stormyi. This is a weighted average derivative with weighting function P[p1i > t > poi] = P[pi < tzi =0] — P[pi < tzi = 1]

at price t. In other words, IV estimation using stormyi produces an average of the derivative q’ (t), with weight given to each possible price (indexed by t) in proportion to the instrument-induced change in the cumulative distribution function (CDF) of prices at that point. This is the same sort of averaging as in the ACR theorem except that now the underlying causal response is a derivative instead of a one-unit difference.

The average causal response formula, (4.5.6), comes from the fact that

by the fundamental theorem of calculus. Two interesting special cases fall neatly out of equation (4.5.8). The first is when the causal response function is linear, i. e., qi(p) = aoi + a1ip, for some random coefficients, aoi and a1i. Then, we have

E[qistormyi = 1] – E[q{stormyi = 0] = E[a1i(p1j – poi)] E[pistormyi = 1] – E[pistormyi = 0] E[pu – poi] ’

a weighted average of the random coefficient, a1i. The weights are proportional to the price change induced by the weather in market i.

The second special case is when we can write quantity demanded as

where Q(p) is a non-stochastic function and Vi is an additive random error. By this we mean q[(p) = Q'(p)

every day or in every market. In this case, the average causal response function becomes

J Q'(t)w(t)dt, where w(t)

These special cases highlight the two types of averaging wrapped up in the ACR theorem and its continuous corollary, (4.5.6). First, there is averaging across markets, with weights proportional to the first-stage impact on prices in each market. Markets where prices are highly sensitive to the weather contribute the most. Second, there is averaging along the length of the causal response function in a given market. IV recovers the average derivative over a range of prices where the CDF of prices shifts most sharply.

## Leave a reply