Propensity-score methods shift attention from the estimation of E[Yj|Xj, Dj] to the estimation of the propensity score, p(Xi) = E[Dj|Xj]. This is attractive in applications where the latter is easier to model or
motivate. For example, Ashenfelter (1978) showed that participants in government-funded training programs often have suffered a marked pre-program dip in earnings, a pattern found in many later studies. If this dip is the only thing that makes trainees special, then we can estimate the causal effect of training on earnings by controlling for past earnings dynamics. In practice, however, it’s hard to match on earnings dynamics since earnings histories are both continuous and multi-dimensional... Read More
To simplify, we ignore covariates and year effects and assume there are only two periods, with treatment equal to zero for everyone in the first period (the punch line is the same in a more general setup). The causal effect of interest, P, is positive. Suppose first that treatment is correlated with an unobserved individual effect, aj, and that outcomes can be described by
Y it — aj + PDj t + "it-
where "jt is serially uncorrelated, and uncorrelated with aj and Djt. We also have
Y it— 1 — aj + £jt-i,
where aj and £jt_1 are uncorrelated. You mistakenly estimate the effect of Djt in a model that controls for Yjt_1 but ignores fixed effects. The resulting estimator has probability limit, where Djt — Djt —
7Yjt_i is the residual from a regression of Djt on Yjt_1.
Now substitute a = Y... Read More
Constant-effects models with more instruments than endogenous regressors are said to be over-identified. Because there are more instruments than needed to identify the parameters of interest, these models impose a set of restrictions that can be evaluated as part of a process of specification testing. This process amounts to asking whether the line plotted in a VIV-type picture fits the relevant conditional means tightly enough given the precision with which the means are estimated. The details behind this useful idea are easiest to spell out using matrix notation and a traditional linear model.
denote the vector formed by concatenating the covariates and the single endogenous variable of interest... Read More
We have normality. I repeat, we have normality.
Anything you still can’t cope with is therefore your own problem.
Douglas Adams, The Hitchhiker’s Guide to the Galaxy (1979)
Today, software packages routinely compute asymptotic standard errors derived under weak assumptions about the sampling process or underlying model. For example, you get regression standard errors based on formula (3.1.7) using the Stata option "robust". Robust standard errors improve on old-fashioned standard errors because the resulting inferences are asymptotically valid when the regression residuals are heteroskedastic, as they almost certainly are when regression approximates a nonlinear CEF. In contrast, old-fashioned standard errors are derived assuming homoskedasticity... Read More
2.5.1 2SLS Mistakes
2SLS estimates are easy to compute, especially since software like SAS and Stata will do it for you. Occasionally, however, you might be tempted to do it yourself just to see if it really works. Or you may be stranded on the planet Krikkit with all of your software licenses expired (Krikkit is encased in a slo-time envelope, so it will take you a long time to get licenses renewed). "Manual 2SLS" is for just such emergencies. In the Manual 2SLS procedure, you estimate the first stage yourself (which in any case, you should be looking at), and plug the fitted values into the second stage equation, which is then estimated by OLS. Returning to the system at the beginning of this chapter, the first and second stages are
si — Xi^io + ^1iZi + £ii Yi — a’Xi + psi + [pi + p(... Read More
1.4.1 Weighting Regression
Few things are as confusing to applied researchers as the role of sample weights. Even now, 20 years post – Ph. D., we read the section of the Stata manual on weighting with some dismay. Weights can be used in a number of ways, and how they are used may well matter for your results. Regrettably, however, the case for or against weighting is often less than clear-cut, as are the specifics of how the weights should be programmed. A detailed discussion of weighting pros and cons is beyond the scope of this book. See Pfefferman (1993) and Deaton (1997) for two perspectives. In this brief subsection, we provide a few guidelines and a rationale for our approach to weighting.
A simple rule of thumb for weighting regression is use weights when they make it more likely th... Read More