Category Mostly Harmless Econometrics: An Empiricist’s Companion

IV with Heterogeneous Potential Outcomes

The discussion of IV up to this point postulates a constant causal effect. In the case of a dummy variable like veteran status, this means Y^—Yoi = p for all i, while with a multi-valued treatment like schooling, this means Ysi — YS-1,i = p for all s and all i. Both are highly stylized views of the world, especially the multi-valued case which imposes linearity as well as homogeneity. To focus on one thing at a time in a heterogeneous-effects model, we start with a zero-one causal variable. In this context, we’d like to allow for treatment-effect heterogeneity, in other words, a distribution of causal effects across individuals.

Why is treatment-effect heterogeneity important? The answer lies in the distinction between the two types of validity that characterize a research design...

Read More

Clustering and Serial Correlation in Panels

8.2.1 Clustering and the Moulton Factor

Bias problems aside, heteroskedasticity rarely leads to dramatic changes in inference. In large samples where bias is not likely to be a problem, we might see standard errors increase by about 25 percent when moving from the conventional to the HC1 estimator. In contrast, clustering can make all the difference.

The clustering problem can be illustrated using a simple bivariate regression estimated in data with a group structure. Suppose we’re interested in the bivariate regression,

Yig — P 0 + P 1xg + eig; (8.2.1)

where Yig is the dependent variable for individual i in cluster or group g, with G groups. Importantly, the regressor of interest, xg, varies only at the group level...

Read More

Limited Dependent Variables Reprise

In Section 3.4.2, we discussed the consequences of limited dependent variables for regression models. When the dependent variable is binary or non-negative, say, employment status or hours worked, the CEF is typically nonlinear. Most nonlinear LDV models are built around a non-linear transformation of a linear latent index. Examples include Probit, Logit, and Tobit. These models capture features of the associated CEFs (e. g., Probit fitted values are guaranteed to be between zero and one, while Tobit fitted values are non-negative). Yet we saw that the added complexity and extra work required to interpret the results from latent-index models may not be worth the trouble.

An important consideration in favor of OLS is a conceptual robustness that structural models often lack...

Read More

Why is Regression Called Regression and What Does Regression-to-the – mean Mean?

The term regression originates with Francis Galton’s (1886) study of height. Galton, who worked with samples of roughly-normally-distributed data on parents and children, noted that the CEF of a child’s height given his parents’ height is linear, with parameters given by the bivariate regression slope and intercept. Since height is stationary (its distribution is not changing [much] over time), the bivariate regression slope is also the correlation coefficient, i. e., between zero and one.

The single regressor in Galton’s set-up, Xj, is average parent height and the dependent variable, Yj, is the height the of adult children. The regression slope coefficient, as always, is Pi = ^^(У*.)^ , and the intercept is a = E [Yj] — P1E [Xj]...

Read More

Fuzzy RD is IV

Fuzzy RD exploits discontinuities in the probability or expected value of treatment conditional on a covariate. The result is a research design where the discontinuity becomes an instrumental variable for treatment status instead of deterministically switching treatment on or off. To see how this works, let D; denote the treatment as before, though here D; is no longer deterministically related to the threshold-crossing rule, x; > xo■ Rather, there is a jump in the probability of treatment at xo, so that

r, I go(xi) if x; > xo

P[D; = 1|x;J = > , where gi (xo ) = go (xo)■

I gi(x;) if x; < xo

The functions go(x;) and g1(x;) can be anything as long as they differ (and the more the better) at xo. We’ll assume g1(xo) > go(xo), so x; > xo makes treatment more likely...

Read More

Local Average Treatment Effects

In an IV framework, the engine that drives causal inference is the instrument, zj, but the variable of interest is still Dj. This feature of the IV setup leads us to adopt a generalized potential-outcomes concept, indexed against both instruments and treatment status. Let Yj(d, z) denote the potential outcome of individual i were this person to have treatment status Dj = d and instrument value Zj = z. This tells us, for example, what the earnings of i would be given alternative combinations of veteran status and draft-eligibility status. The causal effect of veteran status given i’s realized draft-eligibility status is Yj(1,Zj)—Yj(0,Zj), while the causal effect of draft-eligibility status given i’s veteran status is Yj(Dj, 1)—Yj(Dj, 0).

We can think of instrumental variables as ini...

Read More