In labor economics, one observes the market wages of individuals only if the worker participates in the labor force. This happens when the worker’s market wage exceeds his or her reservation wage. In a study of earnings, one does not observe the reservation wage and for non-participants in the labor force we record a zero market wage. This sample is censored because we observe the characteristics of these non-labor participants. If we restrict our attention to labor market participants only, then the sample is truncated. This example needs special attention because the censoring is not based directly on the dependent variable, as in section 13.11. Rather, it is based on the difference between market wage and reservation wage. This latent variable which determines the sample selection is correlated with the dependent variable. Hence, least squares
on this model results in selectivity bias, see Heckman (1976, 1979). A sample generated by this type of self-selection may not represent the true population distribution no matter how big the sample size. However, one can correct for self-selection bias if the underlying sampling generating process can be understood and relevant identification conditions are available, see Lee (2001) for a excellent summary and the references cited there. In order to demonstrate this, let the earnings equation be given by
w* = хф + ui i = 1,2,…,n (13.65)
and the labor participation (or selection) equation be given by
V* = %2i7 + vi i = 1,2,…,n (13.66)
where ui and vi are bivariate normally distributed with mean zero and variance-covariance
( Ui a2 pa
V Vi ) [ Pa 1
Normalizing the variance of vi to be 1 is not restrictive, since only the sign of v* is observed. In fact, we only observe wi and Vi where
wi = w* if Vi > 0 (13.68)
= 0 otherwise
Vi = 1 if Vi > 0
= 0 otherwise
We observe (yi = 0,wi = 0) and (yi = 1,wi = w*) only. The log-likelihood for this model is
E№=o log Pr[Vi = °] + Eyi=i log Pr[Vi = 1]/(w*i/Vi = 1) (13.69)
where /(w*/Vi = 1) is the conditional density of w* given that Vi = 1. The second term can also be written as Syi=1log Pr[Vi = 1/w*]/(w*) which is another way of factoring the joint density function. /(wi) is in fact a normal density with conditional mean x’1ie and variance a2. Using properties of the bivariate normal density, one can write
(w* – xliв)^ + Ci (13.70)
where ci ~ IIN(0, a2(1 — p2)). Therefore,
where wi has been substituted for wii since Vi = 1. The likelihood function in (13.69) becomes
1 " ‘ ‘ (13.72)
^ yi = 1 l0gф
MLE may be computationally burdensome. Heckman (1976) suggested a two-step procedure which is based on rewriting (13.65) as
w* = @х’и + pavi + Пі (13.73)
and replacing w* by wi and vi by its conditional mean E[vi/yi = 1]. Using the results on truncated density, this conditional mean is given by ф(х’2і^) / Ф(—x’2iY) known also as the inverse Mills ratio. Hence, (13.73) becomes
Heckman’s (1976) two-step estimator consists of (i) running a probit on (13.66) in the first step to get a consistent estimate of у, (ii) substituting the estimated inverse Mills ratio in (13.74) and running OLS. Since a is positive, this second stage regression provides a test for sample selectivity, i. e., for p = 0, by checking whether the t-statistic on the estimated inverse Mills ratio is significant. This statistic is asymptotically distributed as N(0,1). Rejecting Ho implies there is a selectivity problem and one should not rely on OLS on (13.65) which ignores the selectivity bias term in (13.74). Davidson and MacKinnon (1993) suggest performing MLE using (13.72) rather than relying on the two-step results in (13.74) if the former is not computationally burdensome. Note that the Tobit model for car purchases given in (13.50) can be thought of as a special case of the sample selectivity model given by (13.65) and (13.66). In fact, the Tobit model assumes that the selection equation (the decision to buy a car) and the car expenditure equation (conditional on the decision to buy) are identical. Therefore, if one thinks that the specification of the selection equation is different from that of the expenditure equation, then one should not use the Tobit model. Instead, one should proceed with the two equation sample selectivity model discussed in this section. It is also important to emphasize that for the censored, truncated and sample selectivity models, normality and homoskedasticity are crucial assumptions. Suggested tests for these assumptions are given in Bera, Jarque and Lee (1984), Lee and Maddala (1985) and Pagan and Vella (1989). Alternative estimation methods that are more robust to violations of normality and heteroskedasticity include symmetrically trimmed least squares for Tobit models and least absolute deviations estimation for censored regression models. These were suggested by Powell (1984, 1986).
1. This is based on Davidson and MacKinnon (1993, pp. 523-526).
2. A binary response model attempts to explain a zero-one (or binary) dependent variable.
3. One should not use nR2 as the test statistic because the total sum of squares in this case is not n.
1. The Linear Probability Model.
(a) For the linear probability model described in (13.1), show that for E(ui) to equal zero, we must have Pr[yi = 1] = xifi.
(b) Show that ui is heteroskedastic with var(uj) = x’iP(1 — xiв).
2. Consider the general log-likelihood function given in (13.16). Assume that all the regression slopes are zero except for the constant a.
(a) Show that maximizing logf with respect to the constant a yields F(a) = y, where y is the proportion of the sample with yi = 1.
(b) Conclude that the value of the maximized likelihood is logfr = n[ylogy + (1 — y)log(1 — y)].
(c) Verify that for the union participation example in section 13.9 that log£r = —390.918.
3. For the union participation example considered in section 13.9:
(b) Using the measures of fit considered in section 13.8, compute Ri, R,.. .,R for the logit and probit models.
(c) Compute the predicted value for the 10th observation using OLS, logit and probit models. Also, the corresponding standard errors.
(e) Using the model results in part (d), test that the slope coefficients are all zero for the logit, probit, and linear probability models.
(f) Test that the coefficients of IND, FEM and BLK in Table 13.3 are jointly insignificant using a LR test, a Wald test and a BRMR using OLS, logit and probit.
4. For the data used in the union participation example in section 13.9:
(a) Run OLS, logit and probit using as the dependent variable OCC which is one if the individual is in a blue-collar occupation, and zero otherwise. For the independent variables use ED, WKS, EXP, SOUTH, SMSA, IND, MS, FEM and BLK. Compare the coefficient estimates across the three models. What variables are significant?
(b) Using the measures of fit considered in section 13.8, compute R2, R,. ..,R for the logit and probit models.
(d) Test that the slope coefficients are all zero for the logit, probit, and linear probability models.
5. Truncated Uniform Density. Let x be a uniformly distributed random variable with density
f (x) = – for — 1 < x < 1
(a) What is the density function of f (x/x > —1/2)? Hint: Use the definition of a conditional density
f (x/x > —1/2) = f (x)/Pr[x > —1/2] for — — < x < 1.
(b) What is the conditional mean E(x/x > —1/2)? How does it compare with the unconditional mean of x? Note that because we truncated the density from below, the new mean should shift to the right.
(c) What is the conditional variance var(x/x > —1/2)? How does it compare to the unconditional var(x)? (Truncation reduces the variance).
6. Truncated Normal Density. Let x be N(1,1). Using the results in the Appendix, show that:
(a) The conditional density f (x/x > 1) = 2ф(х — 1) for x > 1 and f (x/x < 1) = 2ф(х — 1) for x < 1.
(b) The conditional mean E(x/x > 1) = 1 + 2ф(0) and E(x/x < 1) = 1 — 2ф(0). Compare with the unconditional mean of x.
(c) The conditional variance var(x/x > 1) = var(x/x < 1) = 1 — 4ф2(0). Compare with the unconditional variance of x.
7. Censored Normal Distribution. This is based on Greene (1993, pp. 692-693). Let y* be N(p,,<r2) and define y = y* if y* > c and y = c if y* < c for some constant c.
(a) Verify the E(y) expression given in (A.7).
(b) Derive the var(y) expression given in (A.8). Hint: Use the fact that
var(y) = E(conditional variance) + var(conditional mean)
and the formulas given in the Appendix for conditional and unconditional means of a truncated normal random variable.
8. Fixed and Adjustable Rate Mortgages. Dhillon, Shilling and Sirmans (1987) considered the economic decision of choosing between fixed and adjustable rate mortgages. The data consisted of 78 households borrowing money from a Louisiana mortgage banker. Of these, 46 selected fixed rate mortgages and 32 selected uncapped adjustable rate mortgages. This data set can be downloaded from the Springer web site and is labelled DHILLON. ASC. It was obtained from Lott and Ray (1992). These variables include:
Y = 0 if adjustable rate and 1 if fixed rate.
BA = Age of the borrower.
BS = Years of schooling for the borrower.
NW = Net worth of the borrower.
FI = Fixed interest rate.
PTS = Ratio of points paid on adjustable to fixed rate mortgages.
MAT = Ratio of maturities on adjustable to fixed rate mortgages.
MOB = Years at the present address.
MC = 1 if the borrower is married and 0 otherwise.
FTB = 1 if the borrower is a first-time home buyer and 0 otherwise.
SE = 1 if the borrower is self-employed and 0 otherwise.
YLD = The difference between the 10-year treasury rate less the 1-year treasury rate.
MARG = The margin on the adjustable rate mortgage.
CB = 1 if there is a co-borrower and 0 otherwise.
STL = Short-term liabilities.
LA = Liquid assets.
The probability of choosing a variable rate mortgage is a function of personal borrower characteristics as well as mortgage characteristics. The efficient market hypothesis state that only cost variables and not personal borrower characteristics influence the borrower’s decision between fixed and adjustable rate mortgages. Cost variables include FI, MARG, YLD, PTS and MAT. The rest of the variables are personal characteristics variables. The principal agent theory suggests that information is asymmetric between lender and borrower. Therefore, one implication of this theory is that the personal characteristics of the borrower will be significant in the choice of mortgage loan.
(a) Run OLS of Y on all variables in the data set. For this linear probability model what does the F-statistic for the significance of all slope coefficients yield? What is the R2? How many predictions are less than zero or larger than one?
(b) Using only the cost variables in the restricted regression, test that personal characteristics are jointly insignificant. Hint: Use the Chow-F statistic. Do you find support for the efficient market hypothesis?
(c) Run the above model using the logit specification. Test the efficient market hypothesis. Does your conclusion change from that in part (b)? Hint: Use the likelihood ratio test or the BRMR.
(d) Do part (c) using the probit specification.
9. Sampling Distribution of OLS Under a Logit Model. This is based on Baltagi (2000).
Consider a simple logit regression model
Vt = Hfixt) +ut
for t = 1, 2, where A(z) = ez/(1 + ez) for —to < z < to. Let в = 1, xi = 1, x2 = 2 and assume that the ut’s are independent with mean zero.
(a) Derive the sampling distribution of the least squares estimator of в, i. e., assuming a linear probability model when the true model is a logit model.
(b) Derive the sampling distribution of the least squares residuals and verify the estimated variance of вols is biased.
10. Sample Selection and Non-response. This is based on Manski (1995), see the Appendix to this chapter. Suppose we are interested in estimating the probability than an individual who is homeless at a given date has a home six months later. Let y = 1 if the individual has a home six months later and y = 0 if the individual remains homeless. Let x be the sex of the individual and let z =1 if the individual was located and interviewed and zero otherwise. 100 men and 31 women were initially interviewed. Six months later, only 64 men and 14 women were located and interviewed. Of the 64 men, 21 exited homelessness. Of the 14 women only 3 exited homelessness.
(a) Compute Pr[y = 1/Male, z = 1], Pr[z = 1/Male] and the bound on Pr[y = 1/Male].
(b) Compute Pr[y = 1/Female, z = 1], Pr[z = 1/Female] and the bound on Pr[y = 1/Female].
(c) Show that the width of the bound is equal to the probability of attrition. Which bound is tighter? Why?
11. Does the Link Matter? This is based on Santos Silva (1999). Consider a binary random variable Yi such that
P (Yi = 1|x) = F (в о + в iXi), i = 1,…, n,
where the link F(■) is a continuous distribution function.
(a) Write down the log-likelihood function and the first-order conditions of maximization with respect to во and в1.
(b) Consider the case where xi only assumes two different values, without loss of generality, let
it be 0 and 1- Show that F(1) = ^x =1 Уі/п1, where n1 is the number of observations for which Xi = 1- Also, show that F(0) = x =0 yi/(n — n1).
(c) What are the maximum likelihood estimates of в0 and в1 ?
(d) Show that the value of the log-likelihood function evaluated at the maximum likelihood estimates of в0 and в1 is the same, independent of the form of the link function-
12- Beer Taxes and Motor Vehicle Fatality. Ruhm (1996) considered the effect of beer taxes and a variety of alcohol-control policies on motor vehicle fatality rates, see section 13.4. The data is for 48 states (excluding Alaska, Hawaii and the District of Columbia) over the period 1982-1988. This data set can be downloaded from the Stock and Watson (2003) web site at www. aw. com/ stock watson. Using this data set replicate the results in Table 13.1.
13. Problem Drinking and Employment. Mullahy and Sindelar (1996) considered the effect of problem drinking on employment and unemployment. The data set is based on the 1988 Alcohol Supplement of the National Health Interview Survey. This can be downloaded from the Journal of Applied Econometrics web site at http://qed. econ. queensu. ca/jae/2002-v17.4/terza/.
(a) Replicate the probit results in Table 13.6 and run also the logit and OLS regressions with robust White standard errors. The OLS results should match those given in Table 5 of Mullahy and Sindelar (1996).
(b) Compute marginal effects as reported in Table 13.7 and average marginal effects as reported in Table 13.8. Compute the classifications of actual vs predicted as reported in Table 13.9. Repeat these calculations for OLS and logit.
(c) Mullahy and Sindelar (1996) performed similar regressions for females and for the dependent variable taking the value of 1 if the individual is unemployed and zero otherwise. Replicate the OLS results in Tables 5 and 6 of Mullahy and Sindelar (1996) and perform the corresponding logit and probit regressions. Repeat part (b) for the female data set. What is your conclusion on the relationship between problem drinking and unemployment?
14. Fractional Response. Papke and Wooldridge (1996) studied the effect of match rates on participation rates in 401(K) pension plans. The data are from the 1987 IRS Form 5500 reports of pension plans with more than 100 participants. This data set can be downloaded from the Journal of Applied Econometrics web site at http://qed. econ. queensu. ca/jae/1996-V1L6/papke.
(a) Replicate Tables I and II of Papke and Wooldridge (1996).
(b) Run the specification tests (RESET) described in Papke and Wooldridge (1996).
(c) Compare OLS and logit QMLE using R2, specification tests, and predictions for various values of MRATE as done in Figure 1 of Papke and Wooldridge (1996, p. 630).
15. Fertility and Female Labor Supply. Carrasco (2001) estimated a probit equation for fertility using PSID data over the period 1986-1989, see Table 13.10. The sample consists of 1,442 married or cohabiting women between the ages of 18 and 55 in 1986. The data set can be obtained from the Journal of Business & Economic Statistics archive data web site.
(a) Replicate Table 4, columns 1 and 2, of Carrasco (2001, p. 391). Show that having children of the same sex has a significant and positive effect on the probability of having an additional child.
(b) Compute the predicted probabilities from these regression and the percentage of correct decisions. Also compute the marginal effects.
(c) Replicate Table 5, columns 1 and 4, of Carrasco (2001, p. 392) which run a female labor force participation equation using OLS and probit. Compute the predicted probabilities from the probit regression and the percentage of correct decisions. Also compute the marginal effects.
(d) Replicate Table 5, column 5, of Carrasco (2001, p. 392) which runs 2sls on the female labor force participation equation using as instruments the same sex variables and their interactions with ags26l. Compute the over-identification test for this 2sls.
(e) Replicate Table 7, column 4, of Carrasco (2001, p. 393) which runs fixed effects on the female labor force participation equation with robust standard errors. Run also fixed effects 2sls using as instruments the same sex variables and their interactions with ags26l.
16. Multinomial Logit. Terza (2002) ran a multinomial logit model on the Mullahy and Sindelar (1996) data set for problem drinking described in problem 13, by explicitly accounting for the multinomial classification of the dependent variable. In particular, y =1 when the individual is out of the labor force, y = 2 when this individual is unemployed, and y = 3 when this individual is employed.
(a) Replicate the multinomial logit estimates for the male data reported in Table II of Terza (2002, p. 399) columns 3,4, 9 and 10.
(b) Obtain the multinomial logit estimates for the female data. How does your conclusion on the relationship between problem drinking and employment/unemployment change from that in problem 13?
17. Tobit Estimation of Married Women Labor Supply. Wooldridge (2009, p. 593) estimated a Tobit equation for the Mroz (1987) data considered in problem 11.31. Using the PSID for 1975, Mroz’s sample consists of 753 married white women between the ages of 30 and 60 in 1975, with 428 working at some time during the year. The wife’s annual hours of work (hours) is regressed on nonwife income (nwifeinc); the wife’s age (age), her years of schooling (educ), the number of children less than six years old in the household (kidslt6), and the number of children between the ages of five and nineteen (kidsge6). The data set was obtained from Wooldridge’s (2009) data web site.
(a) Give a detailed summary of hours of work and determine the extent of skewness and kurtosis.
(b) Run OLS and Tobit estimation as described above and replicate Table 17.2 of Wooldridge (2009, p. 593).
(c) Using the variable in the labor force (inlf), run OLS, logit and probit using the same explanatory variables given above and replicate Table 17.1 of Wooldridge (2009, p. 585).
Give the predicted classification and the marginal effects at the mean as well as the average marginal effects for these three specifications.
(d) Compare the estimates of (в/о2) from the probit in part (c) with the Tobit estimates given in part (b).
18. Heckit Estimation of Married Women’s Earnings. Wooldridge (2009, p. 611) estimated a log wage equation for the Mroz (1987) data considered in problem 13.7. The wife’s log wage (lwage) is regressed on her years of schooling (educ), her experience (exper) and its square (expersq). The probit equation to correct for sample selection includes the regressors plus the following additional variables: the number of children less than six years old in the household (kidslt6), the number of children between the ages of five and nineteen (kidsge6), nonwife income (nwifeinc) and the wife’s age (age).
(a) Run OLS and Heckit estimation as described above and replicate Table 17.5 of Wooldridge (2009, p. 611).
(b) Test that the inverse Mills ratio is not significant. What do you conclude?
(c) Run the MLE of this Heckman (1976) sample selection model.
This chapter is based on the material in Hanushek and Jackson (1977), Maddala (1983), Davidson and MacKinnon (1993) and Greene (1993). Additional references include:
Amemiya, T. (1981), “Qualitative Response Models: A Survey,” Journal of Economic Literature, 19: 1481-1536.
Amemiya, T. (1984), “Tobit Models: A Survey,” Journal of Econometrics, 24: 3-61.
Baltagi, B. H. (2000), “Sampling Distribution of OLS Under a Logit Model,” Problem 00.3.1, Econometric Theory, 16: 451.
Bera, A. K., C. Jarque and L. F. Lee (1984), “Testing the Normality Assumption in Limited Dependent Variable Models,” International Economic Review, 25: 563-578.
Berkson, J. (1953), “A Statistically Precise and Relatively Simple Method of Estimating the Bio-Assay with Quantal Response, Based on the Logistic Function,” Journal of the American Statistical Association, 48: 565-599.
Berndt, E., B. Hall, R. Hall and J. Hausman (1974), “Estimation and Inference in Nonlinear Structural Models,” Annals of Economic and Social Measurement, 3/4: 653-665.
Boskin, M. (1974), “A Conditional Logit Model of Occupational Choice,” Journal of Political Economy, 82: 389-398.
Carrasco, R. (2001), “Binary Choice with Binary Endogenous Regressors in Panel Data: Estimating the Effect Fertility on Female Labor Participation,” Journal of Business & Economic Statistics, 19:
Cornwell, C. and P. Rupert (1988), “Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variables Estimators,” Journal of Applied Econometrics, 3: 149-155.
Cox, D. R. (1970), The Analysis of Binary Data (Chapman and Hall: London).
Cragg, J. and R. Uhler (1970), “The Demand for Automobiles,” Canadian Journal of Economics, 3:
Davidson, R. and J. MacKinnon (1984), “Convenient Specification Tests for Logit and Probit Models,” Journal of Econometrics, 25: 241-262.
Dhillon, U. S., J. D. Shilling and C. F. Sirmans (1987), “Choosing Between Fixed and Adjustable Rate Mortgages,” Journal of Money, Credit and Banking, 19: 260-267.
Effron, B. (1978), “Regression and ANOVA with Zero-One Data: Measures of Residual Variation,” Journal of the American Statistical Association, 73: 113-121.
Goldberger, A. (1964), Econometric Theory (Wiley: New York).
Gourieroux, C., A. Monfort and A. Trognon (1984), “Pseudo-Maximum Likelihood Methods: Theory,” Econometrica, 52: 681-700.
Hanushek, E. A. and J. E. Jackson (1977), Statistical Methods for Social Scientists (Academic Press: New York).
Hausman, J. and D. McFadden (1984), “A Specification Test for Multinomial Logit Model,” Econometrica, 52: 1219-1240.
Hausman, J. A. and D. A. Wise (1977), “Social Experimentation, Truncated Distributions, and Efficient Estimation,” Econometrica, 45: 919-938.
Heckman, J. (1976), “The Common Structure of Statistical Models of Truncation, Sample Selection, and Limited Dependent Variables and a Simple Estimator for Such Models,” Annals of Economic and Social Measurement, 5: 475-492.
Heckman, J. (1979), “Sample Selection Bias as a Specification Error,” Econometrica, 47: 153-161.
Lee, L. F. (2001), “Self-Selection,” Chapter 18 in B. H. Baltagi (ed.) A Companion to Theoretical Econometrics (Blackwell: Massachusetts).
Lee, L. F. and G. S. Maddala (1985), “The Common Structure of Tests for Selectivity Bias, Serial Correlation, Heteroskedasticity and Non-Normality in the Tobit Model,” International Economic Review, 26: 1-20.
Lott, W. F. and S. C. Ray (1992), Applied Econometrics: Problems with Data Sets (The Dryden Press: New York).
Maddala, G. (1983), Limited Dependent and Qualitative Variables in Econometrics (Cambridge University Press: Cambridge).
Manski, C. F. (1995), Identification Problems in the Social Sciences (Harvard University Press: Cambridge).
McFadden, D. (1974), “The Measurement of Urban Travel Demand,” Journal of Public Economics, 3: 303-328.
McCullagh, P. and J. A. Nelder (1989), Generalized Linear Models (Chapman and Hall: New York).
Mroz, T. A. (1987), “The Sensitivity of an Empirical Model of Married Women’s Hours of Work to Economic and Statistical Assumptions,” Econometrica, 55: 765-799.
Mullahy, J. and J. Sindelar (1996), “Employment, Unemployment, and Problem Drinking,” Journal of Health Economics, 15: 409-434.
Pagan, A. R. and F. Vella (1980), “Diagnostic Tests for Models Based on Individual Data: A Survey,” Journal of Applied Econometrics, 4: S29-S59.
Papke, L. E. and J. M. Wooldridge (1996), “Econometric Methods for Fractional Response Variables with An Application to 401(K) Plan Participation Rates,” Journal of Applied Econometrics, 11: 619632.
Powell, J. (1984), “Least Absolute Deviations Estimation of the Censored Regression Model,” Journal of Econometrics, 25: 303-325.
Powell, J. (1986), “Symmetrically Trimmed Least Squares Estimation for Tobit Models,” Econometrica, 54: 1435-1460.
Pratt, J. W. (1981), “Concavity of the Log-Likelihood,” Journal of the American Statistical Association, 76: 103-109.
Ruhm, C. J. (1996), “Alcohol Policies and Highway Vehicle Fatalities,” Journal of Health Economics, 15: 435-454.
Schmidt, P. and R. Strauss (1975), “Estimation of Models With Jointly Dependent Qualitative Variables: A Simultaneous Logit Approach,” Econometrica, 43: 745-755.
Terza, J. (2002), “Alcohol Abuse and Employment: A Second Look,” Journal of Applied Econometrics, 17: 393-404.
Wooldridge, J. M. (1991), “Specification Testing and Quasi-Maximum Likelihood Estimation,” Journal of Econometrics, 48: 29-55.
1. Truncated Normal Distribution
Let x be N(y, a), then for a constant c, the truncated density is given by
In other words, the truncated mean shifts to the right (left) if truncation is from below (above).
The conditional variances are given by a2(1 — S(c*)) with 0 < S(c*) < 1 for all values of c*.
where E(y*/y* > c) is obtained from the mean of a truncated normal density, see (A.3). Similarly, one can show, see problem 7 or Greene (1993, p. 693) that
where S(c*) was defined in (A.5).