Empirical Examples
Example 1: Union Participation
To illustrate the logit and probit models, we consider the PSID data for 1982 used in Chapter 4. In this example, we are interested in modelling union participation. Out of the 595 individuals observed in 1982, 218 individuals had their wage set by a union and 377 did not. The explanatory variables used are: years of education (ED), weeks worked (WKS), years of fulltime work experience (EXP), occupation (OCC = 1, if the individual is in a bluecollar occupation), residence (SOUTH = 1, SMSA = 1, if the individual resides in the South, or in a standard metropolitan statistical area), industry (IND = 1, if the individual works in a manufacturing industry), marital status (MS = 1, if the individual is married), sex and race (FEM = 1, BLK = 1, if the individual is female or black). A full description of the data is given in Cornwell and Rupert (1988). The results of the linear probability, logit and probit models are given in Table 13.3. These were computed using EViews. In fact Table 13.4 gives the probit output. We have already mentioned that the probit model normalizes a to be 1. But, the logit model has variance n2/3. Therefore, the logit estimates tend to be larger than the probit estimates although by a factor less than пД/3. In order to make the logit results comparable to those of the probit, Amemiya (1981) suggests multiplying the logit coefficient estimates by 0.625.
Similarly, to make the linear probability estimates comparable to those of the probit model one needs to multiply these coefficients by 2.5 and then subtract 1.25 from the constant term. For this example, both logit and probit procedures converged quickly in 4 iterations. The log – likelihood values and McFadden’s (1974) R2 obtained for the last iteration are recorded.
Table 13.3 Comparison of the Linear Probability, Logit and Probit Models: Union Participation*
* Figures in parentheses are tstatistics 
Note that the logit and probit estimates yield similar results in magnitude, sign and significance. One would expect different results from the logit and probit only if there are several observations in the tails. The following variables were insignificant at the 5% level: EXP, IND, MS, FEM and BLK. The results show that union participation is less likely if the individual resides in the South and more likely if he or she resides in a standard metropolitan statistical area. Union participation is also less likely the more the weeks worked and the higher the years of education. Union participation is more likely for bluecollar than non bluecollar occupations. The linear probability model yields different estimates from the logit and probit results. OLS predicts two observations with & > 1, and 29 observations with & < 0. Table 13.5 gives the actual versus predicted values of union participation for the linear probability, logit and probit models. The percentage of correct predictions is 75% for the linear probability and probit model and 76% for the logit model.
One can test the significance of all slope coefficients by computing the LR based on the unrestricted loglikelihood value (logiu) reported in Table 13.3, and the restricted loglikelihood value including only the constant. The latter is the same for both the logit and probit models and is given by
log4 = n[ylogy + (1 – y)log(1 – y)] (13.33)
where y is the proportion of the sample with yi = 1, see problem 2. In this example, y = 218/595 = 0.366 and n = 595 with logir = 390.918. Therefore, for the probit model,
LR = 2[log4 – login] = —2[—390.918 + 313.380] = 155.1
which is distributed as хІо under the null of zero slope coefficients. This is highly significant and the null is rejected. Similarly, for the logit model this LR statistic is 157.2. For the linear probability model, the same null hypothesis of zero slope coefficients can be tested using a
Table 13.4 Probit Estimates: Union Participation
Convergence achieved after 5 iterations Covariance matrix computed using second derivatives

Table 13.5 Actual Versus Predicted 
: Union Participation 

Predicted 
Total 

Union = 
0 
Union = 1 

Union =0 
OLS 
312 
OLS = 
65 
377 

LOGIT 
316 
LOGIT = 
61 

Probit 
314 
Probit = 
63 

Actual 

Union =1 
OLS 
83 
OLS = 
135 
218 

LOGIT 
82 
LOGIT = 
136 

Probit 
86 
Probit = 
132 

OLS 
395 
OLS = 
200 
595 

LOGIT 
398 
LOGIT = 
197 

Probit 
400 
Probit = 
195 
Chow Fstatistic. This yields an observed value of 17.80 which is distributed as F(10, 584) under the null hypothesis. Again, the null is soundly rejected. This Ftest is in fact the BRMR test considered in section 13.6. As described in section 13.8, McFadden’s R2 is given by R2 = 1 — loglu/loglr] which for the probit model yields
R2 = 1 — (313.380/390.918) = 0.198.
For the logit model, McFadden’s R2 is 0.201.
Example 2: Employment and Problem Drinking
Mullahy and Sindelar (1996) estimate a linear probability model relating employment and measures of problem drinking. The analysis is based on the 1988 Alcohol Supplement of the National Health Interview Survey. This regression was performed for Males and Females separately since the authors argue that women are less likely than men to be alcoholic, are more likely to abstain from consumption, and have lower mean alcohol consumption levels. They also report that women metabolize ethanol faster than do men and experience greater liver damage for the same level of consumption of ethanol. The dependent variable takes the value 1 if the individual was employed in the past two weeks and zero otherwise. The explanatory variables included the 90th percentile of ethanol consumption in the sample (18 oz. for males and 10.8 oz. for females) and zero otherwise. This variables is denoted by hvdrnk90. The state unemployment rate in 1988 (UE88), Age, Age2, schooling, married, family size, and white. Health status dummies indicating whether the individual’s health was excellent, very good, fair. Region of residence, whether the individual resided in the northeast, midwest or south. Also, whether he or she resided in center city (msa1) or other metropolitan statistical area (not center city, msa2). Three additional dummy variables were included for the quarters in which the survey was conducted. Details on the definitions of these variables are given in Table 1 of Mullahy and Sindelar (1996). Table 13.6 gives the probit results based on n = 9822 males using Stata. These results show a negative relationship between the 90th percentile alcohol variable and the probability of being employed, but this has a pvalue of 0.075. Mullahy and Sindelar find that for both men and women, problem drinking results in reduced employment and increased unemployment. Table 13.7 gives the marginal effects computed in Stata using the mfx option after probit estimation. The marginal effects are computed at the sample mean of the variables, except in the case of dummy variables where it is done for a discrete change from 0 to 1. For example, the marginal effect of being a heavy drinker in the upper 90th percentile of ethanol consumption in the sample, (given that all the other variables are evaluated at their mean and dummy variables are changing from 0 to 1), is to decrease the probability of employment by 1.6%. These can also be computed at particular values of the explanatory variables with the option at in Stata. In fact Table 13.8 gives the average marginal effect for all males. This can be computed using the margeff command in Stata. In this case the average marginal effect for a heavy drinker (.0165) did not change much from the marginal effect computed at the sample mean (.0162) and neither did the standard error (.0096 compared with.0093). The goodness of fit as measured by how well this probit classifies the predicted probabilities is given in Table 13.9 using the estat classification option in Stata. The percentage of correct predictions is 90.79%. Problem 13 asks the reader to verify these results as well as those in the original article by Mullahy and Sindelar (1996).
. probit emp hvdrnk90 ue88 age agesq educ married famsize white hlstat1 hlstat2 hlstat3 hlstat4 region1 region2 region3 msa1 msa2 q1 q2 q3, robust
Probit regression
Log pseudolikelihood = – 
2698.1797 
Pseudo R2 
= 0.1651 

emp 
Coef. 
Robust Std. Err. 
z 
P> z 
[95% Conf. Interval] 

hvdrnk90 
.1049465 
.0589881 
1.78 
0.075 
.2205612 
.0106681 
ue88 
.0532774 
.0142025 
3.75 
0.000 
.0811137 
.0254411 
age 
.0996338 
.0171185 
5.82 
0.000 
.0660821 
.1331855 
agesq 
.0013043 
.0002051 
6.36 
0.000 
.0017062 
.0009023 
educ 
.0471834 
.0066739 
7.07 
0.000 
.0341029 
.0602639 
married 
.2952921 
.0540858 
5.46 
0.000 
.189286 
.4012982 
famsize 
.0188906 
.0140463 
1.34 
0.179 
.0086398 
.0464209 
white 
.3945226 
.0483381 
8.16 
0.000 
.2997818 
.4892634 
hlstat1 
1.816306 
.0983447 
18.47 
0.000 
1.623554 
2.009058 
hlstat2 
1.778434 
.0991531 
17.94 
0.000 
1.584098 
1.972771 
hlstat3 
1.547836 
.0982637 
15.75 
0.000 
1.355243 
1.74043 
hlstat4 
1.043363 
.1077279 
9.69 
0.000 
.8322205 
1.254506 
region1 
.0343123 
.0620021 
0.55 
0.580 
.0872096 
.1558341 
region2 
.0604907 
.0537885 
1.12 
0.261 
.0449327 
.1659142 
region3 
.1821206 
.0542346 
3.36 
0.001 
.0758227 
.2884185 
msa1 
.0730529 
.0518719 
1.41 
0.159 
.1747199 
.0286141 
msa2 
.0759533 
.0513092 
1.48 
0.139 
.0246109 
.1765175 
q1 
.1054844 
.0527728 
2.00 
0.046 
.2089171 
.0020516 
q2 
.0513229 
.0528185 
0.97 
0.331 
.1548453 
.0521995 
q3 
.0293419 
.0543751 
0.54 
0.589 
.1359152 
.0772313 
cons 
3.017454 
.3592321 
8.40 
0.000 
3.721536 
2.313372 
Number of obs Wald chi2(20) Prob > chi2 
9822 928.33 0.0000 
Example 3: Fertility and Same Sex of Previous Children
Carrasco (2001) estimated a probit equation for fertility using PSID data over the period 19861989. The sample consists of 1,442 married or cohabiting women between the ages of 18 and 55 in 1986. The dependent variable fertility (f) is specified by a dummy variable that equals 1 if the age of the youngest child in the next year is 1. The explanatory variables are: (ags26l) which is a dummy variable that equals 1 if the woman has a child between 2 and 6 years old; education which has three levels (educ 1, educ 2 and educ 3), the female’s age, race, and husband’s income. An indicator of same sex of previous children (dsex), and its components: (dsexf) for girls, and (dsexm) for boys. This variable exploits the widely observed phenomenon of parental preferences for a mixed siblingsex composition in developed countries. Therefore, a dummy for whether the sex of the next child matches the sex of the previous children provides a plausible predictor for additional childbearing. The data set can be obtained from the Journal of Business & Economic Statistics archive data web site. Problem 15 asks the reader to replicate some of the results obtained in the original article by Carrasco (2001). The estimates reveal that having children of the same sex has a significant and positive effect on the probability of having an additional child. The marginal effect of same sex children increases the probability of fertility by 3%, see Table 13.10. These are obtained using the dprobit command in Stata.
. mfx compute
Marginal effects after probit
y = Pr(emp) (predict)
= .92244871
variable 
dy/dx 
Std. Err. 
z 
P> z 
[95% Conf. Interval] 
X 

hvdrnk90* 
.0161704 
.00962 
1.68 
0.093 
.035034 
.002693 
.099165 
ue88 
.0077362 
.00205 
3.78 
0.000 
.011747 
.003725 
5.56921 
age 
.0144674 
.00248 
5.83 
0.000 
.009607 
.019327 
39.1757 
agesq 
.0001894 
.00003 
6.37 
0.000 
.000248 
.000131 
1627.61 
educ 
.0068513 
.00096 
7.12 
0.000 
.004966 
.008737 
13.3096 
married* 
.0488911 
.01009 
4.85 
0.000 
.029119 
.068663 
.816432 
famsize 
.002743 
.00204 
1.35 
0.179 
.001253 
.006739 
2.7415 
white* 
.069445 
.01007 
6.90 
0.000 
.049709 
.089181 
.853085 
hlstat1* 
.2460794 
.01484 
16.58 
0.000 
.216991 
.275167 
.415903 
hlstat2* 
.1842432 
.00992 
18.57 
0.000 
.164799 
.203687 
.301873 
hlstat3* 
.130786 
.00661 
19.80 
0.000 
.11784 
.143732 
.205254 
hlstat4* 
.0779836 
.00415 
18.77 
0.000 
.069841 
.086126 
.053451 
region1* 
.0049107 
.00875 
0.56 
0.575 
.012233 
.022054 
.203014 
region2* 
.0086088 
.0075 
1.15 
0.251 
.006092 
.023309 
.265628 
region3* 
.0252543 
.00715 
3.53 
0.000 
.011247 
.039262 
.318265 
msa1* 
.0107946 
.00779 
1.39 
0.166 
.026061 
.004471 
.333232 
msa2* 
.0109542 
.00735 
1.49 
0.136 
.003456 
.025365 
.434942 
q1* 
.0158927 
.00825 
1.93 
0.054 
.032053 
.000268 
.254632 
q2* 
.0075883 
.00795 
0.95 
0.340 
.023167 
.007991 
.252698 
q3* 
.0043066 
.00807 
0.53 
0.594 
.020121 
.011508 
.242822 
(*) dy/dx is for discrete change of dummy variable from 0 to 1 
Leave a reply