Empirical Examples

Example 1: Union Participation

To illustrate the logit and probit models, we consider the PSID data for 1982 used in Chapter 4. In this example, we are interested in modelling union participation. Out of the 595 individuals observed in 1982, 218 individuals had their wage set by a union and 377 did not. The explanatory variables used are: years of education (ED), weeks worked (WKS), years of full-time work experience (EXP), occupation (OCC = 1, if the individual is in a blue-collar occupation), residence (SOUTH = 1, SMSA = 1, if the individual resides in the South, or in a standard metropolitan statistical area), industry (IND = 1, if the individual works in a manufacturing industry), marital status (MS = 1, if the individual is married), sex and race (FEM = 1, BLK = 1, if the individual is female or black). A full description of the data is given in Cornwell and Rupert (1988). The results of the linear probability, logit and probit models are given in Table 13.3. These were computed using EViews. In fact Table 13.4 gives the probit output. We have already mentioned that the probit model normalizes a to be 1. But, the logit model has variance n2/3. Therefore, the logit estimates tend to be larger than the probit estimates although by a factor less than пД/3. In order to make the logit results comparable to those of the probit, Amemiya (1981) suggests multiplying the logit coefficient estimates by 0.625.

Similarly, to make the linear probability estimates comparable to those of the probit model one needs to multiply these coefficients by 2.5 and then subtract 1.25 from the constant term. For this example, both logit and probit procedures converged quickly in 4 iterations. The log – likelihood values and McFadden’s (1974) R2 obtained for the last iteration are recorded.

Table 13.3 Comparison of the Linear Probability, Logit and Probit Models: Union Participation*

Variable

OLS

Logit

Probit

EXP

-.005 (1.14)

-.007 (1.15)

-.007 (1.21)

WKS

-.045 (5.21)

-.068 (5.05)

-.061 (5.16)

OCC

.795 (6.85)

1.036 (6.27)

.955 (6.28)

IND

.075 (0.79)

.114 (0.89)

.093 (0.76)

SOUTH

-.425 (4.27)

-.653 (4.33)

-.593 (4.26)

SMSA

.211 (2.20)

.280 (2.05)

.261 (2.03)

MS

.247 (1.55)

.378 (1.66)

.351 (1.62)

FEM

-.272 (1.37)

-.483 (1.58)

-.407 (1.47)

ED

-.040 (1.88)

-.057 (1.85)

-.057 (1.99)

BLK

.125 (0.71)

.222 (0.90)

.226 (0.99)

Const

1.740 (5.27)

2.738 (3.27)

2.517 (3.30)

Log-likelihood

-312.337

-313.380

McFadden’s R2

0.201

0.198

Xio

157.2

155.1

* Figures in parentheses are t-statistics

Note that the logit and probit estimates yield similar results in magnitude, sign and significance. One would expect different results from the logit and probit only if there are several observations in the tails. The following variables were insignificant at the 5% level: EXP, IND, MS, FEM and BLK. The results show that union participation is less likely if the individual resides in the South and more likely if he or she resides in a standard metropolitan statistical area. Union participation is also less likely the more the weeks worked and the higher the years of education. Union participation is more likely for blue-collar than non blue-collar occupations. The linear probability model yields different estimates from the logit and probit results. OLS predicts two observations with & > 1, and 29 observations with & < 0. Table 13.5 gives the actual versus predicted values of union participation for the linear probability, logit and probit models. The percentage of correct predictions is 75% for the linear probability and probit model and 76% for the logit model.

One can test the significance of all slope coefficients by computing the LR based on the unrestricted log-likelihood value (logiu) reported in Table 13.3, and the restricted log-likelihood value including only the constant. The latter is the same for both the logit and probit models and is given by

log4 = n[ylogy + (1 – y)log(1 – y)] (13.33)

where y is the proportion of the sample with yi = 1, see problem 2. In this example, y = 218/595 = 0.366 and n = 595 with logir = -390.918. Therefore, for the probit model,

LR = -2[log4 – login] = —2[—390.918 + 313.380] = 155.1

which is distributed as хІо under the null of zero slope coefficients. This is highly significant and the null is rejected. Similarly, for the logit model this LR statistic is 157.2. For the linear probability model, the same null hypothesis of zero slope coefficients can be tested using a

Подпись: Dependent Variable: Method: Sample: Included observations: Подпись: UNION ML - Binary Probit 1 595 595

Table 13.4 Probit Estimates: Union Participation

Convergence achieved after 5 iterations Covariance matrix computed using second derivatives

Variable

Coefficient

Std. Error

z-Statistic

Prob.

EX

-0.006932

0.005745

-1.206491

0.2276

WKS

-0.060829

0.011785

-5.161666

0.0000

OCC

0.955490

0.152137

6.280476

0.0000

IND

0.092827

0.122774

0.756085

0.4496

SOUTH

-0.592739

0.139102

-4.261183

0.0000

SMSA

0.260700

0.128630

2.026741

0.0427

MS

0.350520

0.216284

1.620648

0.1051

FEM

-0.407026

0.277038

-1.469203

0.1418

ED

-0.057382

0.028842

-1.989515

0.0466

BLK

0.226482

0.228845

0.989675

0.3223

C

2.516784

0.762612

3.300217

0.0010

Mean dependent var

0.366387

S. D. dependent var

0.482222

S. E. of regression

0.420828

Akaike info criterion

1.090351

Sum squared resid

103.4242

Schwarz criterion

1.171484

Log likelihood

-313.3795

Hannan-Quinn criter.

1.121947

Restr. log likelihood

-390.9177

Avg. log likelihood

-0.526688

LR statistic (10 df)

155.0763

McFadden R-squared

0.198349

Probability(LR stat)

0.000000

Obs with Dep=0

377

Total obs

595

Obs with Dep=1

218

Table 13.5 Actual Versus Predicted

: Union Participation

Predicted

Total

Union =

0

Union = 1

Union =0

OLS

312

OLS =

65

377

LOGIT

316

LOGIT =

61

Probit

314

Probit =

63

Actual

Union =1

OLS

83

OLS =

135

218

LOGIT

82

LOGIT =

136

Probit

86

Probit =

132

OLS

395

OLS =

200

595

Total

LOGIT

398

LOGIT =

197

Probit

400

Probit =

195

Chow F-statistic. This yields an observed value of 17.80 which is distributed as F(10, 584) under the null hypothesis. Again, the null is soundly rejected. This F-test is in fact the BRMR test considered in section 13.6. As described in section 13.8, McFadden’s R2 is given by R2 = 1 — loglu/loglr] which for the probit model yields

R2 = 1 — (313.380/390.918) = 0.198.

For the logit model, McFadden’s R2 is 0.201.

Example 2: Employment and Problem Drinking

Mullahy and Sindelar (1996) estimate a linear probability model relating employment and mea­sures of problem drinking. The analysis is based on the 1988 Alcohol Supplement of the National Health Interview Survey. This regression was performed for Males and Females separately since the authors argue that women are less likely than men to be alcoholic, are more likely to ab­stain from consumption, and have lower mean alcohol consumption levels. They also report that women metabolize ethanol faster than do men and experience greater liver damage for the same level of consumption of ethanol. The dependent variable takes the value 1 if the individual was employed in the past two weeks and zero otherwise. The explanatory variables included the 90th percentile of ethanol consumption in the sample (18 oz. for males and 10.8 oz. for females) and zero otherwise. This variables is denoted by hvdrnk90. The state unemploy­ment rate in 1988 (UE88), Age, Age2, schooling, married, family size, and white. Health status dummies indicating whether the individual’s health was excellent, very good, fair. Region of residence, whether the individual resided in the northeast, midwest or south. Also, whether he or she resided in center city (msa1) or other metropolitan statistical area (not center city, msa2). Three additional dummy variables were included for the quarters in which the survey was conducted. Details on the definitions of these variables are given in Table 1 of Mullahy and Sindelar (1996). Table 13.6 gives the probit results based on n = 9822 males using Stata. These results show a negative relationship between the 90th percentile alcohol variable and the probability of being employed, but this has a p-value of 0.075. Mullahy and Sindelar find that for both men and women, problem drinking results in reduced employment and increased unemployment. Table 13.7 gives the marginal effects computed in Stata using the mfx option after probit estimation. The marginal effects are computed at the sample mean of the variables, except in the case of dummy variables where it is done for a discrete change from 0 to 1. For example, the marginal effect of being a heavy drinker in the upper 90th percentile of ethanol consumption in the sample, (given that all the other variables are evaluated at their mean and dummy variables are changing from 0 to 1), is to decrease the probability of employment by 1.6%. These can also be computed at particular values of the explanatory variables with the option at in Stata. In fact Table 13.8 gives the average marginal effect for all males. This can be computed using the margeff command in Stata. In this case the average marginal effect for a heavy drinker (-.0165) did not change much from the marginal effect computed at the sample mean (-.0162) and neither did the standard error (.0096 compared with.0093). The goodness of fit as measured by how well this probit classifies the predicted probabilities is given in Ta­ble 13.9 using the estat classification option in Stata. The percentage of correct predictions is 90.79%. Problem 13 asks the reader to verify these results as well as those in the original article by Mullahy and Sindelar (1996).

. probit emp hvdrnk90 ue88 age agesq educ married famsize white hlstat1 hlstat2 hlstat3 hlstat4 region1 region2 region3 msa1 msa2 q1 q2 q3, robust

Probit regression

Log pseudolikelihood = –

2698.1797

Pseudo R2

= 0.1651

emp

Coef.

Robust Std. Err.

z

P> z

[95% Conf. Interval]

hvdrnk90

-.1049465

.0589881

-1.78

0.075

-.2205612

.0106681

ue88

-.0532774

.0142025

-3.75

0.000

-.0811137

-.0254411

age

.0996338

.0171185

5.82

0.000

.0660821

.1331855

agesq

-.0013043

.0002051

-6.36

0.000

-.0017062

-.0009023

educ

.0471834

.0066739

7.07

0.000

.0341029

.0602639

married

.2952921

.0540858

5.46

0.000

.189286

.4012982

famsize

.0188906

.0140463

1.34

0.179

-.0086398

.0464209

white

.3945226

.0483381

8.16

0.000

.2997818

.4892634

hlstat1

1.816306

.0983447

18.47

0.000

1.623554

2.009058

hlstat2

1.778434

.0991531

17.94

0.000

1.584098

1.972771

hlstat3

1.547836

.0982637

15.75

0.000

1.355243

1.74043

hlstat4

1.043363

.1077279

9.69

0.000

.8322205

1.254506

region1

.0343123

.0620021

0.55

0.580

-.0872096

.1558341

region2

.0604907

.0537885

1.12

0.261

-.0449327

.1659142

region3

.1821206

.0542346

3.36

0.001

.0758227

.2884185

msa1

-.0730529

.0518719

-1.41

0.159

-.1747199

.0286141

msa2

.0759533

.0513092

1.48

0.139

-.0246109

.1765175

q1

-.1054844

.0527728

-2.00

0.046

-.2089171

-.0020516

q2

-.0513229

.0528185

-0.97

0.331

-.1548453

.0521995

q3

-.0293419

.0543751

-0.54

0.589

-.1359152

.0772313

cons

-3.017454

.3592321

-8.40

0.000

-3.721536

-2.313372

Number of obs Wald chi2(20) Prob > chi2

9822

928.33

0.0000

Example 3: Fertility and Same Sex of Previous Children

Carrasco (2001) estimated a probit equation for fertility using PSID data over the period 1986­1989. The sample consists of 1,442 married or cohabiting women between the ages of 18 and 55 in 1986. The dependent variable fertility (f) is specified by a dummy variable that equals 1 if the age of the youngest child in the next year is 1. The explanatory variables are: (ags26l) which is a dummy variable that equals 1 if the woman has a child between 2 and 6 years old; education which has three levels (educ 1, educ 2 and educ 3), the female’s age, race, and husband’s income. An indicator of same sex of previous children (dsex), and its components: (dsexf) for girls, and (dsexm) for boys. This variable exploits the widely observed phenomenon of parental preferences for a mixed sibling-sex composition in developed countries. Therefore, a dummy for whether the sex of the next child matches the sex of the previous children provides a plausible predictor for additional childbearing. The data set can be obtained from the Journal of Business & Economic Statistics archive data web site. Problem 15 asks the reader to replicate some of the results obtained in the original article by Carrasco (2001). The estimates reveal that having children of the same sex has a significant and positive effect on the probability of having an additional child. The marginal effect of same sex children increases the probability of fertility by 3%, see Table 13.10. These are obtained using the dprobit command in Stata.

image392

. mfx compute

Marginal effects after probit

y = Pr(emp) (predict)

= .92244871

variable

dy/dx

Std. Err.

z

P> |z|

[95% Conf. Interval]

X

hvdrnk90*

-.0161704

.00962

-1.68

0.093

-.035034

.002693

.099165

ue88

-.0077362

.00205

-3.78

0.000

-.011747

-.003725

5.56921

age

.0144674

.00248

5.83

0.000

.009607

.019327

39.1757

agesq

-.0001894

.00003

-6.37

0.000

-.000248

-.000131

1627.61

educ

.0068513

.00096

7.12

0.000

.004966

.008737

13.3096

married*

.0488911

.01009

4.85

0.000

.029119

.068663

.816432

famsize

.002743

.00204

1.35

0.179

-.001253

.006739

2.7415

white*

.069445

.01007

6.90

0.000

.049709

.089181

.853085

hlstat1*

.2460794

.01484

16.58

0.000

.216991

.275167

.415903

hlstat2*

.1842432

.00992

18.57

0.000

.164799

.203687

.301873

hlstat3*

.130786

.00661

19.80

0.000

.11784

.143732

.205254

hlstat4*

.0779836

.00415

18.77

0.000

.069841

.086126

.053451

region1*

.0049107

.00875

0.56

0.575

-.012233

.022054

.203014

region2*

.0086088

.0075

1.15

0.251

-.006092

.023309

.265628

region3*

.0252543

.00715

3.53

0.000

.011247

.039262

.318265

msa1*

-.0107946

.00779

-1.39

0.166

-.026061

.004471

.333232

msa2*

.0109542

.00735

1.49

0.136

-.003456

.025365

.434942

q1*

-.0158927

.00825

-1.93

0.054

-.032053

.000268

.254632

q2*

-.0075883

.00795

-0.95

0.340

-.023167

.007991

.252698

q3*

-.0043066

.00807

-0.53

0.594

-.020121

.011508

.242822

(*) dy/dx is for discrete change of dummy variable from 0 to 1

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>