# Category Mostly Harmless Econometrics: An Empiricist’s Companion

## Sharp RD

Sharp RD is used when treatment status is a deterministic and discontinuous function of a covariate, xj. Suppose, for example, that

1 if Xj > xo

. (6.1.

0 if Xj < xo

where xo is a known threshold or cutoff. This assignment mechanism is a deterministic function of Xj because once we know xj we know Dj. It’s a discontinuous function because no matter how close xj gets to xo, treatment is unchanged until xj = xo.

This may seem a little abstract, so here is an example. American high school students are awarded National Merit Scholarship Awards on the basis of PSAT scores, a test taken by most college-bound high school juniors, especially those who will later take the SAT...

## IV with Heterogeneous Potential Outcomes

The discussion of IV up to this point postulates a constant causal effect. In the case of a dummy variable like veteran status, this means Y^—Yoi = p for all i, while with a multi-valued treatment like schooling, this means Ysi — YS-1,i = p for all s and all i. Both are highly stylized views of the world, especially the multi-valued case which imposes linearity as well as homogeneity. To focus on one thing at a time in a heterogeneous-effects model, we start with a zero-one causal variable. In this context, we’d like to allow for treatment-effect heterogeneity, in other words, a distribution of causal effects across individuals.

Why is treatment-effect heterogeneity important? The answer lies in the distinction between the two types of validity that characterize a research design...

## Clustering and Serial Correlation in Panels

8.2.1 Clustering and the Moulton Factor

Bias problems aside, heteroskedasticity rarely leads to dramatic changes in inference. In large samples where bias is not likely to be a problem, we might see standard errors increase by about 25 percent when moving from the conventional to the HC1 estimator. In contrast, clustering can make all the difference.

The clustering problem can be illustrated using a simple bivariate regression estimated in data with a group structure. Suppose we’re interested in the bivariate regression,

Yig — P 0 + P 1xg + eig; (8.2.1)

where Yig is the dependent variable for individual i in cluster or group g, with G groups. Importantly, the regressor of interest, xg, varies only at the group level...

## Limited Dependent Variables Reprise

In Section 3.4.2, we discussed the consequences of limited dependent variables for regression models. When the dependent variable is binary or non-negative, say, employment status or hours worked, the CEF is typically nonlinear. Most nonlinear LDV models are built around a non-linear transformation of a linear latent index. Examples include Probit, Logit, and Tobit. These models capture features of the associated CEFs (e. g., Probit fitted values are guaranteed to be between zero and one, while Tobit fitted values are non-negative). Yet we saw that the added complexity and extra work required to interpret the results from latent-index models may not be worth the trouble.

An important consideration in favor of OLS is a conceptual robustness that structural models often lack...

## Why is Regression Called Regression and What Does Regression-to-the – mean Mean?

The term regression originates with Francis Galton’s (1886) study of height. Galton, who worked with samples of roughly-normally-distributed data on parents and children, noted that the CEF of a child’s height given his parents’ height is linear, with parameters given by the bivariate regression slope and intercept. Since height is stationary (its distribution is not changing [much] over time), the bivariate regression slope is also the correlation coefficient, i. e., between zero and one.

The single regressor in Galton’s set-up, Xj, is average parent height and the dependent variable, Yj, is the height the of adult children. The regression slope coefficient, as always, is Pi = ^^(У*.)^ , and the intercept is a = E [Yj] — P1E [Xj]...

## Fuzzy RD is IV

Fuzzy RD exploits discontinuities in the probability or expected value of treatment conditional on a covariate. The result is a research design where the discontinuity becomes an instrumental variable for treatment status instead of deterministically switching treatment on or off. To see how this works, let D; denote the treatment as before, though here D; is no longer deterministically related to the threshold-crossing rule, x; > xo■ Rather, there is a jump in the probability of treatment at xo, so that

r, I go(xi) if x; > xo

P[D; = 1|x;J = > , where gi (xo ) = go (xo)■

I gi(x;) if x; < xo

The functions go(x;) and g1(x;) can be anything as long as they differ (and the more the better) at xo. We’ll assume g1(xo) > go(xo), so x; > xo makes treatment more likely...