# Empirical Examples

Example 1: Union Participation

To illustrate the logit and probit models, we consider the PSID data for 1982 used in Chapter 4. In this example, we are interested in modelling union participation. Out of the 595 individuals observed in 1982, 218 individuals had their wage set by a union and 377 did not. The explanatory variables used are: years of education (ED), weeks worked (WKS), years of full-time work experience (EXP), occupation (OCC = 1, if the individual is in a blue-collar occupation), residence (SOUTH = 1, SMSA = 1, if the individual resides in the South, or in a standard metropolitan statistical area), industry (IND = 1, if the individual works in a manufacturing industry), marital status (MS = 1, if the individual is married), sex and race (FEM = 1, BLK = 1, if the individual is female or black). A full description of the data is given in Cornwell and Rupert (1988). The results of the linear probability, logit and probit models are given in Table 13.3. These were computed using EViews. In fact Table 13.4 gives the probit output. We have already mentioned that the probit model normalizes a to be 1. But, the logit model has variance n2/3. Therefore, the logit estimates tend to be larger than the probit estimates although by a factor less than пД/3. In order to make the logit results comparable to those of the probit, Amemiya (1981) suggests multiplying the logit coefficient estimates by 0.625.

Similarly, to make the linear probability estimates comparable to those of the probit model one needs to multiply these coefficients by 2.5 and then subtract 1.25 from the constant term. For this example, both logit and probit procedures converged quickly in 4 iterations. The log – likelihood values and McFadden’s (1974) R2 obtained for the last iteration are recorded.

Table 13.3 Comparison of the Linear Probability, Logit and Probit Models: Union Participation*

 Variable OLS Logit Probit EXP -.005 (1.14) -.007 (1.15) -.007 (1.21) WKS -.045 (5.21) -.068 (5.05) -.061 (5.16) OCC .795 (6.85) 1.036 (6.27) .955 (6.28) IND .075 (0.79) .114 (0.89) .093 (0.76) SOUTH -.425 (4.27) -.653 (4.33) -.593 (4.26) SMSA .211 (2.20) .280 (2.05) .261 (2.03) MS .247 (1.55) .378 (1.66) .351 (1.62) FEM -.272 (1.37) -.483 (1.58) -.407 (1.47) ED -.040 (1.88) -.057 (1.85) -.057 (1.99) BLK .125 (0.71) .222 (0.90) .226 (0.99) Const 1.740 (5.27) 2.738 (3.27) 2.517 (3.30) Log-likelihood -312.337 -313.380 McFadden’s R2 0.201 0.198 Xio 157.2 155.1

* Figures in parentheses are t-statistics

Note that the logit and probit estimates yield similar results in magnitude, sign and significance. One would expect different results from the logit and probit only if there are several observations in the tails. The following variables were insignificant at the 5% level: EXP, IND, MS, FEM and BLK. The results show that union participation is less likely if the individual resides in the South and more likely if he or she resides in a standard metropolitan statistical area. Union participation is also less likely the more the weeks worked and the higher the years of education. Union participation is more likely for blue-collar than non blue-collar occupations. The linear probability model yields different estimates from the logit and probit results. OLS predicts two observations with & > 1, and 29 observations with & < 0. Table 13.5 gives the actual versus predicted values of union participation for the linear probability, logit and probit models. The percentage of correct predictions is 75% for the linear probability and probit model and 76% for the logit model.

One can test the significance of all slope coefficients by computing the LR based on the unrestricted log-likelihood value (logiu) reported in Table 13.3, and the restricted log-likelihood value including only the constant. The latter is the same for both the logit and probit models and is given by

log4 = n[ylogy + (1 – y)log(1 – y)] (13.33)

where y is the proportion of the sample with yi = 1, see problem 2. In this example, y = 218/595 = 0.366 and n = 595 with logir = -390.918. Therefore, for the probit model,

LR = -2[log4 – login] = —2[—390.918 + 313.380] = 155.1

which is distributed as хІо under the null of zero slope coefficients. This is highly significant and the null is rejected. Similarly, for the logit model this LR statistic is 157.2. For the linear probability model, the same null hypothesis of zero slope coefficients can be tested using a  Table 13.4 Probit Estimates: Union Participation

Convergence achieved after 5 iterations Covariance matrix computed using second derivatives

 Variable Coefficient Std. Error z-Statistic Prob. EX -0.006932 0.005745 -1.206491 0.2276 WKS -0.060829 0.011785 -5.161666 0.0000 OCC 0.955490 0.152137 6.280476 0.0000 IND 0.092827 0.122774 0.756085 0.4496 SOUTH -0.592739 0.139102 -4.261183 0.0000 SMSA 0.260700 0.128630 2.026741 0.0427 MS 0.350520 0.216284 1.620648 0.1051 FEM -0.407026 0.277038 -1.469203 0.1418 ED -0.057382 0.028842 -1.989515 0.0466 BLK 0.226482 0.228845 0.989675 0.3223 C 2.516784 0.762612 3.300217 0.0010 Mean dependent var 0.366387 S. D. dependent var 0.482222 S. E. of regression 0.420828 Akaike info criterion 1.090351 Sum squared resid 103.4242 Schwarz criterion 1.171484 Log likelihood -313.3795 Hannan-Quinn criter. 1.121947 Restr. log likelihood -390.9177 Avg. log likelihood -0.526688 LR statistic (10 df) 155.0763 McFadden R-squared 0.198349 Probability(LR stat) 0.000000 Obs with Dep=0 377 Total obs 595 Obs with Dep=1 218

 Table 13.5 Actual Versus Predicted : Union Participation Predicted Total Union = 0 Union = 1 Union =0 OLS 312 OLS = 65 377 LOGIT 316 LOGIT = 61 Probit 314 Probit = 63 Actual Union =1 OLS 83 OLS = 135 218 LOGIT 82 LOGIT = 136 Probit 86 Probit = 132 OLS 395 OLS = 200 595 Total LOGIT 398 LOGIT = 197 Probit 400 Probit = 195

Chow F-statistic. This yields an observed value of 17.80 which is distributed as F(10, 584) under the null hypothesis. Again, the null is soundly rejected. This F-test is in fact the BRMR test considered in section 13.6. As described in section 13.8, McFadden’s R2 is given by R2 = 1 — loglu/loglr] which for the probit model yields

R2 = 1 — (313.380/390.918) = 0.198.

For the logit model, McFadden’s R2 is 0.201.

Example 2: Employment and Problem Drinking

Mullahy and Sindelar (1996) estimate a linear probability model relating employment and mea­sures of problem drinking. The analysis is based on the 1988 Alcohol Supplement of the National Health Interview Survey. This regression was performed for Males and Females separately since the authors argue that women are less likely than men to be alcoholic, are more likely to ab­stain from consumption, and have lower mean alcohol consumption levels. They also report that women metabolize ethanol faster than do men and experience greater liver damage for the same level of consumption of ethanol. The dependent variable takes the value 1 if the individual was employed in the past two weeks and zero otherwise. The explanatory variables included the 90th percentile of ethanol consumption in the sample (18 oz. for males and 10.8 oz. for females) and zero otherwise. This variables is denoted by hvdrnk90. The state unemploy­ment rate in 1988 (UE88), Age, Age2, schooling, married, family size, and white. Health status dummies indicating whether the individual’s health was excellent, very good, fair. Region of residence, whether the individual resided in the northeast, midwest or south. Also, whether he or she resided in center city (msa1) or other metropolitan statistical area (not center city, msa2). Three additional dummy variables were included for the quarters in which the survey was conducted. Details on the definitions of these variables are given in Table 1 of Mullahy and Sindelar (1996). Table 13.6 gives the probit results based on n = 9822 males using Stata. These results show a negative relationship between the 90th percentile alcohol variable and the probability of being employed, but this has a p-value of 0.075. Mullahy and Sindelar find that for both men and women, problem drinking results in reduced employment and increased unemployment. Table 13.7 gives the marginal effects computed in Stata using the mfx option after probit estimation. The marginal effects are computed at the sample mean of the variables, except in the case of dummy variables where it is done for a discrete change from 0 to 1. For example, the marginal effect of being a heavy drinker in the upper 90th percentile of ethanol consumption in the sample, (given that all the other variables are evaluated at their mean and dummy variables are changing from 0 to 1), is to decrease the probability of employment by 1.6%. These can also be computed at particular values of the explanatory variables with the option at in Stata. In fact Table 13.8 gives the average marginal effect for all males. This can be computed using the margeff command in Stata. In this case the average marginal effect for a heavy drinker (-.0165) did not change much from the marginal effect computed at the sample mean (-.0162) and neither did the standard error (.0096 compared with.0093). The goodness of fit as measured by how well this probit classifies the predicted probabilities is given in Ta­ble 13.9 using the estat classification option in Stata. The percentage of correct predictions is 90.79%. Problem 13 asks the reader to verify these results as well as those in the original article by Mullahy and Sindelar (1996).

. probit emp hvdrnk90 ue88 age agesq educ married famsize white hlstat1 hlstat2 hlstat3 hlstat4 region1 region2 region3 msa1 msa2 q1 q2 q3, robust

Probit regression

 Log pseudolikelihood = – 2698.1797 Pseudo R2 = 0.1651 emp Coef. Robust Std. Err. z P> z [95% Conf. Interval] hvdrnk90 -.1049465 .0589881 -1.78 0.075 -.2205612 .0106681 ue88 -.0532774 .0142025 -3.75 0.000 -.0811137 -.0254411 age .0996338 .0171185 5.82 0.000 .0660821 .1331855 agesq -.0013043 .0002051 -6.36 0.000 -.0017062 -.0009023 educ .0471834 .0066739 7.07 0.000 .0341029 .0602639 married .2952921 .0540858 5.46 0.000 .189286 .4012982 famsize .0188906 .0140463 1.34 0.179 -.0086398 .0464209 white .3945226 .0483381 8.16 0.000 .2997818 .4892634 hlstat1 1.816306 .0983447 18.47 0.000 1.623554 2.009058 hlstat2 1.778434 .0991531 17.94 0.000 1.584098 1.972771 hlstat3 1.547836 .0982637 15.75 0.000 1.355243 1.74043 hlstat4 1.043363 .1077279 9.69 0.000 .8322205 1.254506 region1 .0343123 .0620021 0.55 0.580 -.0872096 .1558341 region2 .0604907 .0537885 1.12 0.261 -.0449327 .1659142 region3 .1821206 .0542346 3.36 0.001 .0758227 .2884185 msa1 -.0730529 .0518719 -1.41 0.159 -.1747199 .0286141 msa2 .0759533 .0513092 1.48 0.139 -.0246109 .1765175 q1 -.1054844 .0527728 -2.00 0.046 -.2089171 -.0020516 q2 -.0513229 .0528185 -0.97 0.331 -.1548453 .0521995 q3 -.0293419 .0543751 -0.54 0.589 -.1359152 .0772313 cons -3.017454 .3592321 -8.40 0.000 -3.721536 -2.313372
 Number of obs Wald chi2(20) Prob > chi2

 9822 928.33 0.0000

Carrasco (2001) estimated a probit equation for fertility using PSID data over the period 1986­1989. The sample consists of 1,442 married or cohabiting women between the ages of 18 and 55 in 1986. The dependent variable fertility (f) is specified by a dummy variable that equals 1 if the age of the youngest child in the next year is 1. The explanatory variables are: (ags26l) which is a dummy variable that equals 1 if the woman has a child between 2 and 6 years old; education which has three levels (educ 1, educ 2 and educ 3), the female’s age, race, and husband’s income. An indicator of same sex of previous children (dsex), and its components: (dsexf) for girls, and (dsexm) for boys. This variable exploits the widely observed phenomenon of parental preferences for a mixed sibling-sex composition in developed countries. Therefore, a dummy for whether the sex of the next child matches the sex of the previous children provides a plausible predictor for additional childbearing. The data set can be obtained from the Journal of Business & Economic Statistics archive data web site. Problem 15 asks the reader to replicate some of the results obtained in the original article by Carrasco (2001). The estimates reveal that having children of the same sex has a significant and positive effect on the probability of having an additional child. The marginal effect of same sex children increases the probability of fertility by 3%, see Table 13.10. These are obtained using the dprobit command in Stata. . mfx compute

Marginal effects after probit

y = Pr(emp) (predict)

= .92244871

 variable dy/dx Std. Err. z P> |z| [95% Conf. Interval] X hvdrnk90* -.0161704 .00962 -1.68 0.093 -.035034 .002693 .099165 ue88 -.0077362 .00205 -3.78 0.000 -.011747 -.003725 5.56921 age .0144674 .00248 5.83 0.000 .009607 .019327 39.1757 agesq -.0001894 .00003 -6.37 0.000 -.000248 -.000131 1627.61 educ .0068513 .00096 7.12 0.000 .004966 .008737 13.3096 married* .0488911 .01009 4.85 0.000 .029119 .068663 .816432 famsize .002743 .00204 1.35 0.179 -.001253 .006739 2.7415 white* .069445 .01007 6.90 0.000 .049709 .089181 .853085 hlstat1* .2460794 .01484 16.58 0.000 .216991 .275167 .415903 hlstat2* .1842432 .00992 18.57 0.000 .164799 .203687 .301873 hlstat3* .130786 .00661 19.80 0.000 .11784 .143732 .205254 hlstat4* .0779836 .00415 18.77 0.000 .069841 .086126 .053451 region1* .0049107 .00875 0.56 0.575 -.012233 .022054 .203014 region2* .0086088 .0075 1.15 0.251 -.006092 .023309 .265628 region3* .0252543 .00715 3.53 0.000 .011247 .039262 .318265 msa1* -.0107946 .00779 -1.39 0.166 -.026061 .004471 .333232 msa2* .0109542 .00735 1.49 0.136 -.003456 .025365 .434942 q1* -.0158927 .00825 -1.93 0.054 -.032053 .000268 .254632 q2* -.0075883 .00795 -0.95 0.340 -.023167 .007991 .252698 q3* -.0043066 .00807 -0.53 0.594 -.020121 .011508 .242822