Limited Dependent Variables
In labor economics, one is faced with explaining the decision to participate in the labor force, the decision to join a union, or the decision to migrate from one region to the other. In finance, a consumer defaults on a loan or a credit card debt, or purchases a stock or an asset like a house or a car. In these examples, the dependent variable is usually a dummy variable with values 1 if the worker participates (or consumer defaults on a loan) and 0 if he or she does not participate (or default). We dealt with dummy variables as explanatory variables on the right hand side of the regression, but what additional problems arise when this dummy variable appears on the left hand side of the equation? As we have done in previous chapters, we first study its effects on the usual least squares estimator, and then consider alternative estimators that are more appropriate for models of this nature.
What is wrong with running OLS on this model? After all, it is a feasible procedure. For the labor force participation example one regresses the dummy variable for participation on age, sex, race, marital status, number of children, experience and education, etc. The prediction from this OLS regression is interpreted as the likelihood of participating in the labor force. The problems with this interpretation are the following:
(i) We are predicting probabilities of participation for each individual, whereas the actual values observed are 0 or 1.
(ii) There is no guarantee that fy, the predicted value of yi is going to be between 0 and 1. In fact, one can always find values of the explanatory variables that would generate a corresponding prediction outside the (0,1) range.
(iii) Even if one is willing to assume that the true model is a linear regression given by
yi = xie + Ui i = 1,2,…,n. (13.1)
what properties does this entail on the disturbances? It is obvious that yi = 1 only when ui = 1 — х’ф, let us say with probability ni, where ni is to be determined. Then yi = 0 only when ui = —xi/3 with probability (1 — ni). For the disturbances to have zero mean
E(ui) = Пі(1 — xie) + (1 — ni)(—xie) = 0 (13.2)
Solving for ni, one gets that ni = xifi. This also means that
var( ui) = ni(1 — ni) = хів(1 — х’ф) (13.3)
which is heteroskedastic. Goldberger (1964) suggests correcting for this heteroskedasticity by first running OLS to estimate в, and estimating oj = var(ui) by 32 = x, i/3OLS(1 — x, i/3OLS) =
B. H. Baltagi, Econometrics, Springer Texts in Business and Economics, DOI 10.1007/978-3-642-20059-5_13, 333
© Springer-Verlag Berlin Heidelberg 2011
Уі( 1 — уі). In the next step a Weighted Least Squares (WLS) procedure is run on (13.1) with the original observations divided by <ri. One cannot compute cq if OLS predicts yi larger than 1 or smaller than 0. Suggestions in the literature include substituting 0.005 instead of yi < 0, and 0.995 for yi > 1. However, these procedures do not perform well, and the WLS predictions themselves are not guaranteed to fall in the (0,1) range. Therefore, one should use the robust White heteroskedastic variance-covariance matrix option when estimating linear probability models, otherwise the standard errors are biased and inference is misleading.
This brings us to the fundamental problem with OLS, i. e., its functional form. We are trying to predict
yi = F (x’i в)+ Щ (13.4)
with a linear regression equation, see Figure 13.1, where the more reasonable functional form for this probability is an 5-shaped cumulative distribution functional form. This was justified in the biometrics literature as follows: An insect has a tolerance to an insecticide I*, which is an unobserved random variable with cumulative distribution function (c. d.f.) F. If the dosage of insecticide administered induces a stimulus Ii that exceeds I*, the insect dies, i. e., yi = 1. Therefore
Pr(yi = 1) = Pr(I* < Ii) = F(Ii) (13.5)
To put it in an economic context, Ii* could be the unobserved reservation wage of a worker, and if we increase the offered wage beyond that reservation wage, the worker participates in the labor force. In general, Ii could be represented as a function of the individuals characteristics, i. e., the xi’s. F(xi@) is by definition between zero and 1 for all values of xi. Also, the linear probability model yields the result that dni/dxk = вk, for every i. This means that the probability of participating (ni) always changes at the same rate with respect to unit increases in the offer wage xk. However, this probability model gives
dni/dxk = [dF(zi)/dzi] ■ [dzi/dxk] = f (x’iв) ■ вk (13.6)
where zi = xi в, and f is the probability density function (p. d.f.). Equation (13.6) makes more sense because if xk denotes the offered wage, changing the probability of participation ni from 0.96 to 0.97 requires a larger change in xk than changing ni from 0.23 to 0.24.
If F(^ф) is the true probability function, assuming it is linear introduces misspecification, and as Figure 13.1 indicates, for xi < xg, all the ui’s generated by a linear probability approximation are positive. Similarly for all xi > xu, all the ui, s generated by a linear probability approximation are negative.