Limited Dependent Variables Reprise
In Section 3.4.2, we discussed the consequences of limited dependent variables for regression models. When the dependent variable is binary or nonnegative, say, employment status or hours worked, the CEF is typically nonlinear. Most nonlinear LDV models are built around a nonlinear transformation of a linear latent index. Examples include Probit, Logit, and Tobit. These models capture features of the associated CEFs (e. g., Probit fitted values are guaranteed to be between zero and one, while Tobit fitted values are nonnegative). Yet we saw that the added complexity and extra work required to interpret the results from latentindex models may not be worth the trouble.
An important consideration in favor of OLS is a conceptual robustness that structural models often lack. OLS is always a MMSE linear approximation to the CEF. In fact, we can think of OLS as a scheme for computing marginal effects – a scheme that has the virtue of simplicity, automation, and comparability
across studies. Nonlinear latentindex models are more like GLS – they provide an efficiency gain when taken literally, but they require a commitment to functional form and distributional assumptions about which we do not usually feel strongly.[72] A second consideration is the difference between the latentindex parameters at heart of nonlinear models and the average causal effects that we believe should be the objects of primary interest in most research projects.
The arguments in favor of conventional OLS with LDVs apply with equal force to 2SLS and models with endogenous variables. IV methods capture local average treatment effects regardless of whether the dependent variable is binary, nonnegative, or continuously distributed on the real line. With covariates, we can think of 2SLS as estimating LATE averaged across covariate cells. In models with variable or continuous treatment intensity, 2SLS gives us the average causal response or an average derivative. Although Abadie (2003) has shown that 2SLS does not, in general, provide the MMSE approximation to the complier causal response function, in practice, 2SLS estimates come out remarkably close to estimates using the more rigorously grounded Abadie procedure (and with a saturated model for covariates, 2SLS and Abadie are the same). And, of course, 2SLS estimates LATE directly; there is no intermediate step involving the calculation of marginal effects.
2SLS is not the only way to go. An alternative more elaborate approach tries to build up a causal story by describing the process generating LDVs in detail. A good example is bivariate Probit, which can be applied to the Angrist and Evans (1998) example like this. Suppose that a woman decides to have a third child by comparing costs and benefits using a net benefit function or latent index that is linear in covariates and excluded instruments, with a random component or error term, vj. The bivariate Probit first stage can be written
di = l[Xj7o + 7iZ; >vi], (4.6.12)
where Zj is an instrumental variable that increases the benefit of a third child, conditional on covariates, Xj. For example, American parents appear to value a third child more when they have had either two boys or two girls, a sortof portfoliodiversification phenomenon that can be understood as increasing the benefit
Consistency of the maximum likelihood estimator turns on the assumption that the conditional variance of Y; is p;(1 — p;). It’s worth noting that we can dispense with this assumption and simply fit Y; to Ф by nonlinear least squares (NLLS).
This sort of agnostic NLLS shares the robustness properties of OLS; it gives the best MMSE fit in a class of approximating functions.
of a third child in families with samesex sibships.
An outcome of primary interest in this context is employment status, a Bernoulli random variable with a conditional mean between zero and one. To complete the model, suppose that employment status, Yj, is determined by the latent index
Yj = 1[XjPo + PiDj >"j], (4.6.13)
where "j is a second random component or error term. This latent index can be seen as arising from a comparison of the costs and benefits of working.
The source of omitted variables bias in the bivariate Probit setup is correlation between vj and "j. In other words, unmeasured random determinants of childbearing are correlated with unmeasured random determinants of employment. The model is identified by assuming Zj is independent of these components, and that the random components are normally distributed. Given normality, the parameters in (4.6.12) and (4.6.13) can be estimated by maximum likelihood. The log likelihood function is
where Фь(, •; p£V) is the bivariate normal distribution function with correlation coefficient p£V. Note, however, that we can multiply the latent index coefficients by a positive constant without changing the likelihood. The object of estimation is therefore the ratio of the index coefficients to the standard deviation of the error terms (e. g., Pi/a£).
The potential outcomes defined by the bivariate Probit model are
Yoj = 1[Xj£o > "j] and Yij = 1[XjPo + P1 > "j],
while potential treatment assignments are
Doj = 1[Xj7o > vj] and Dij = 1[Xj7o + 7i > vj],
As usual, only one potential outcome and one potential assignment is observed for any one person. It’s also clear from this representation that correlation between vi and "i is the same thing as correlation between potential treatment assignments and potential outcomes.
The latent index coefficients do not themselves tell us anything about the size of the causal effect of childbearing on employment other than the sign. To see this, note that the average causal effect of childbearing is
E[Yij – Yoj] = E{1[XjPo + Pi > "j] – 1[XjPo > "j]}
while the average effect on the treated is
E[Yii – Y0iDi = 1] = E{1[X’^0 + Pi > "i] – 1[X’^0 > "i]X’7o + 7izi > Vi}
Given alterative distributional assumptions for Vi and £j, these can be anything (If the error terms are heteroskedastic then even the sign is indeterminate).
Under normality, the average causal effects generated by the bivariate Probit model are easy to evaluate. The average causal effect is
E{ 1 [XiPo + Pi >"i] – 1 [XiPo >"i]}
where Ф[] is the normal CDF. The effect on the treated is a little more complicated since it involves the bivariate normal CDF
_ Ф ( XiPo Xp7o +7izi. _ a _ ; a,, . і £ 
Since the bivariate normal CDF is a canned function in many software packages, this is easy enough to calculate in practice.
Bivariate Probit probably qualifies as harmless in the sense that it’s not very complicated, and easy to get right using packaged software routines. Still, it shares the disadvantages of nonlinear latentindex modeling discussed in the previous chapter. First, some researchers become distracted by an effort to identify index coefficients instead of average causal effects. For example, a large literature in econometrics is concerned with the identification of index coefficients without the need for distributional assumptions. Applied researchers interested in causal effects can safely ignore this work.39
A second vice in this context is also a virtue. Bivariate Probit and other models of this sort can be used to identify population average causal effects and/or effects on the treated. 2SLS does not promise you average causal effects, only local average causal effects. But it should be clear from (4.6.15) that the assumed normality of the latent index error terms is essential for this. As always, the best you can do without a distributional assumption is LATE, the average causal effect for compliers. For bivariate Probit, we can
E{A [X’^o + Pi] – A [X’^o] } = A'[X’^o + 7il^i,
where ~i is in [0, Pi]. This always depends on the shape of A[].
E [Yii – YoiDii > Do,]
= E{1[X’^o + Pi > "i] – 1[X’^o > "i] X’7o + 7i > v, > X^},
which, like (4.6.16), can be evaluated using joint normality of v, and ",. But you needn’t bother using normality to evaluate E[Y1,—YoiD1, >Doi], since LATE can be estimated by IV for each X, and averaged using the histogram of the covariates. Alternately, do 2SLS and settle for a varianceweighted average of covariatespecific LATEs.
You might be wondering whether LATE is enough. Perhaps you would like to estimate the average treatment effect or the effect of treatment on the treated and are willing to make a few extra assumptions to do so. That’s all well and good, but in our experience, you can’t get blood from a stone, even with heroic assumptions. Since local information is all that’s in the data, in practice the average causal effects produced by bivariate Probit are likely to be similar to 2SLS estimates provided the model for covariates is sufficiently flexible. This is illustrated in Table 4.6.1, which reports 2SLS and bivariate Probit estimates of the effects of a third child on female labor supply using the AngristEvans (1998) samesex instruments and the same 1980 census sample of married women with 2 or more children used in their paper. The dependent variable is a dummy for having worked the previous year; the endogenous variable is a dummy for having a third child. The first stage effect of a samesex sibship on the probability of a third birth is about 7 percentage points.
Panel A of Table 4.6.1 reports estimates from a model with no covariates. The 2SLS estimate of.138 in column 1 is numerically identical to the Abadie causal effect estimated using a linear model in column 2, as it should be in this case. Without covariates, the 2SLS slope coefficient provides the best linear approximation to the complier causal response function as does Abadie’s kappaweighting procedure. The marginal effect changes little if, instead of a linear approximation, we use nonlinear least squares with a Probit CEF. The marginal effect estimated by minimizing
)’l
is .137, reported in column 3. This is not surprising since the model without covariates imposes no functional form assumptions.
Perhaps more surprising is the fact that marginal effects and the average treatment effects calculated using (4.6.15) and (4.6.16) are also the same as the 2SLS and Abadie estimates. These results are reported in columns 46. The marginal effect calculated using a derivative to approximate to the finite difference in (4.6.15) is .138 (in column 4, labelled MFX for marginal effects), while both average treatment effects are .139 in columns 5 and 6. Adding a few covariates has little effect on the estimates, as can be seen in Panel
Table 4.6.1: 2SLS, Abadie, and bivariate probit estimates of the effects of a third child on female labor supply_______________________________________________________________________________

Notes: Adapted from Angrist (2001). The table compares 2SLS estimates to alternative IV – type estimates of the effect of childbearing on labor supply using nonlinear models. Standard errors for the Abadie estimates were bootstrapped using 100 replications of subsamples of size 20,000. MFX denotes marginal effects; ATE is the average treatment effect; TOT is the average effect of treatment on the treated.
B. In this case, the covariates are all dummy variables, three for race (black, Hispanic, and other), and two indicating first and secondborn boys (the excluded instrument is the interaction of these two). Panels C and D show that adding a linear term in age at first birth and a dummy for maternal age also leaves the estimates unchanged.
The invariance to covariates seems desirable: since the samesex instrument is essentially independent of the covariates, control for covariates is unnecessary to eliminate bias and should primarily affect precision. Yet, as Panel E shows, the marginal effects generated by bivariate Probit are sensitive to the list of covariates. Swapping a dummy indicating mothers over 30 with a linear age term increases the bivariate Probit estimates markedly, to .171, while leaving 2SLS and the Abadie estimators unchanged. This probably reflects the fact that the linear age change induces an extrapolation into cells where there is little data. Although there is no harm in reporting the results in Panel E, it’s hard to see why the more robust 2SLS and Abadie estimators should not be featured as most likely more reliable.[73]
Leave a reply