# Limited Dependent Variables

13.1 The Linear Probability Model

 Уі u; Prob. 1 1 — x0" к; 0 CD. УС 1 1 — к;

a. Let к і = Pr[y; = 1], then y; = 1 when u; = 1 — x0" with probability к; as shown in the table above. Similarly, y; = 0 when u; = —x0" with probability 1 — к ;. Hence, E(u;) = к; (1 — x[") + (1 — к 😉 (—x0").

For this to equal zero, we get, к; — к ;xi" + к ;xi" — x0" = 0 which gives к ; = xi" as required.

b. var(u;) = E(u2) = (1 — xi")2 к ; + (—x0")2 (1 — к 😉

1 — 2×0" + (x0")2 к; + (xi")2 (1 — к i)

= к ; — 2×0" к ; + (x0")2 = к ; — к 2 = к ;(1 — к 😉 = x0" (1 — xi") using the fact that к ; = xi".

13.2 a. Since there are no slopes and only a constant, x0" = a and (13.16) becomes

n

log ‘ = J]{y; logF(a) + (1 — y;) log[1 — F(a)]} differentiating with respect

i=1

to a we get

9log’ y; л (1 — y;) (

= £ щ •f(a)+£г—щ (-f(a».

n

Setting this equal to zero yields J2 (yi — F(a))f(a) = 0.

i=1

n

Therefore, F(a) = J2 Уі/п = y. This is the proportion of the sample with

i=1

Уі = 1

B. H. Baltagi, Solutions Manual for Econometrics, Springer Texts in Business and Economics, DOI 10.1007/978-3-642-54548-1_13, © Springer-Verlag Berlin Heidelberg 2015

b. Using F(a) = y, the value of the maximized likelihood, from (13.16), is

n

log’r =2>logy C (1 Уі)log(l-y)} = nylogy C (n—ny)log(l-y)

i=i

= n[y log y C (1 — y) log(1 — y)] as required.

c. For the empirical example in Sect. 13.9, we know that y = 218/595 = 0.366. Substituting in (13.33) we get, log’r = n[0.366 log0.366 C (1 — 0.366) log(1 — 0.366)] = —390.918.

13.3 Union participation example. See Tables 13.3-13.5. These were run using EViews.

a. OLS ESTIMATION

LS // Dependent Variable is UNION

Sample: 1 595

Included observations: 595

 Variable Coefficient Std. Error t-Statistic Prob. C 1.195872 0.227010 5.267922 0.0000 EX -0.001974 0.001726 -1.143270 0.2534 WKS -0.017809 0.003419 -5.209226 0.0000 OCC 0.318118 0.046425 6.852287 0.0000 IND 0.030048 0.038072 0.789229 0.4303 SOUTH -0.170130 0.039801 -4.274471 0.0000 SMSA 0.084522 0.038464 2.197419 0.0284 MS 0.098953 0.063781 1.551453 0.1213 FEM -0.108706 0.079266 -1.371398 0.1708 ED -0.016187 0.008592 -1.883924 0.0601 BLK 0.050197 0.071130 0.705708 0.4807 R-squared 0.233548 Mean dependent var 0.366387 Adjusted R-squared 0.220424 S. D. dependent var 0.482222 S. E. of regression 0.425771 Akaike info criterion -1.689391 Sum squared resid 105.8682 Schwarz criterion -1.608258 Log likelihood -330.6745 F-statistic 17.79528 Durbin-Watson stat 1.900963 Prob(F-statistic) 0.000000

LOGIT ESTIMATION

LOGIT // Dependent Variable is UNION

Sample: 1 595

Included observations: 595

Convergence achieved after 4 iterations

 Variable Coefficient Std. Error t-Statistic Prob. C 4.380828 1.338629 3.272624 0.0011 EX -0.011143 0.009691 -1.149750 0.2507 WKS -0.108126 0.021428 -5.046037 0.0000 OCC 1.658222 0.264456 6.270325 0.0000 IND 0.181818 0.205470 0.884888 0.3766 SOUTH -1.044332 0.241107 -4.331411 0.0000 SMSA 0.448389 0.218289 2.054110 0.0404 MS 0.604999 0.365043 1.657336 0.0980 FEM -0.772222 0.489665 -1.577040 0.1153 ED -0.090799 0.049227 -1.844501 0.0656 BLK 0.355706 0.394794 0.900992 0.3680 Log likelihood -312.3367 Obs with Dep=1 218 Obs with Dep=0 377 Variable Mean All Mean D=1 Mean D=0 C 1.000000 1.000000 1.000000 EX 22.85378 23.83028 22.28912 WKS 46.45210 45.27982 47.12997 OCC 0.512605 0.766055 0.366048 IND 0.405042 0.513761 0.342175 SOUTH 0.292437 0.197248 0.347480 SMSA 0.642017 0.646789 0.639257 MS 0.805042 0.866972 0.769231 FEM 0.112605 0.059633 0.143236 ED 12.84538 11.84862 13.42175 BLK 0.072269 0.082569 0.066313

PROBIT ESTIMATION

PROBIT // Dependent Variable is UNION

Sample: 1 595

Included observations: 595

Convergence achieved after 3 iterations

 Variable Coefficient Std. Error t-Statistic Prob. C 2.516784 0.762606 3.300242 0.0010 EX -0.006932 0.005745 -1.206501 0.2281 WKS -0.060829 0.011785 -5.161707 0.0000 OCC 0.955490 0.152136 6.280522 0.0000 IND 0.092827 0.122773 0.756089 0.4499 SOUTH -0.592739 0.139100 -4.261243 0.0000 SMSA 0.260701 0.128629 2.026756 0.0431 MS 0.350520 0.216282 1.620664 0.1056 FEM -0.407026 0.277034 -1.469226 0.1423 ED -0.057382 0.028842 -1.989533 0.0471 BLK 0.226482 0.228843 0.989683 0.3227

Log likelihood -313.3795 ObswithDep=1 218 Obs with Dep=0 377

d. Dropping the industry variable (IND).

OLS ESTIMATION

LS // Dependent Variable is UNION

Sample: 1 595

Included observations: 595

 Variable Coefficient Std. Error t-Statistic Prob. C 1.216753 0.225390 5.398425 0.0000 EX -0.001848 0.001718 -1.075209 0.2827 WKS -0.017874 0.003417 -5.231558 0.0000 OCC 0.322215 0.046119 6.986568 0.0000 SOUTH -0.173339 0.039580 -4.379418 0.0000 SMSA 0.085043 0.038446 2.212014 0.0274 MS 0.100697 0.063722 1.580267 0.1146 FEM -0.114088 0.078947 -1.445122 0.1490 ED -0.017021 0.008524 -1.996684 0.0463 BLK 0.048167 0.071061 0.677822 0.4982

 R-squared 0.232731 Mean dependent var 0.366387 Adjusted R-squared 0.220927 S. D. dependent var 0.482222 S. E. of regression 0.425634 Akaike info criterion -1.69169 Sum squared resid 105.981 Schwarz criterion -1.61793 Log likelihood -330.992 F-statistic 19.716 Durbin-Watson stat 1.90771 Prob(F-statistic) 0

LOGIT ESTIMATION

LOGIT // Dependent Variable is UNION

Sample: 1 595

Included observations: 595

Convergence achieved after 4 iterations

 Variable Coefficient Std. Error t-Statistic Prob. C 4.492957 1.333992 3.368053 0.0008 EX -0.010454 0.009649 -1.083430 0.2791 WKS -0.107912 0.021380 -5.047345 0.0000 OCC 1.675169 0.263654 6.353652 0.0000 SOUTH -1.058953 0.240224 -4.408193 0.0000 SMSA 0.449003 0.217955 2.060074 0.0398 MS 0.618511 0.365637 1.691599 0.0913 FEM -0.795607 0.489820 -1.624285 0.1049 ED -0.096695 0.048806 -1.981194 0.0480 BLK 0.339984 0.394027 0.862845 0.3886 Log likelihood -312.7267 Obs with Dep=1 218 Obs with Dep=0 377 Variable Mean All Mean D=1 Mean D=0 C 1.000000 1.000000 1.000000 EX 22.85378 23.83028 22.28912 WKS 46.45210 45.27982 47.12997 OCC 0.512605 0.766055 0.366048 SOUTH 0.292437 0.197248 0.347480 SMSA 0.642017 0.646789 0.639257 MS 0.805042 0.866972 0.769231 FEM 0.112605 0.059633 0.143236 ED 12.84538 11.84862 13.42175 BLK 0.072269 0.082569 0.066313

PROBIT ESTIMATION

PROBIT // Dependent Variable is UNION

Sample: 1 595

Included observations: 595

Convergence achieved after 3 iterations

 Variable Coefficient Std. Error t-Statistic Prob. C 2.570491 0.759181 3.385875 0.0008 EX -0.006590 0.005723 -1.151333 0.2501 WKS -0.060795 0.011777 -5.162354 0.0000 OCC 0.967972 0.151305 6.397481 0.0000 SOUTH -0.601050 0.138528 -4.338836 0.0000 SMSA 0.261381 0.128465 2.034640 0.0423 MS 0.357808 0.216057 1.656085 0.0982 FEM -0.417974 0.276501 -1.511657 0.1312 ED -0.060082 0.028625 -2.098957 0.0362 BLK 0.220695 0.228363 0.966423 0.3342

Log likelihood -313.6647 ObswithDep = 1 218 Obs with Dep = 0 377

f. The restricted regressions omitting IND, FEM and BLK are given below:

LS // Dependent Variable is UNION

Sample: 1 595

Included observations: 595

 Variable Coefficient Std. Error t-Statistic Prob. C 1.153900 0.218771 5.274452 0.0000 EX -0.001840 0.001717 -1.071655 0.2843 WKS -0.017744 0.003412 -5.200421 0.0000 OCC 0.326411 0.046051 7.088110 0.0000 SOUTH -0.171713 0.039295 -4.369868 0.0000 SMSA 0.086076 0.038013 2.264382 0.0239 MS 0.158303 0.045433 3.484351 0.0005 ED -0.017204 0.008507 -2.022449 0.0436 R-squared 0.229543 Mean dependent var 0.366387 Adjusted R-squared 0.220355 S. D. dependent var 0.482222 S. E. of regression 0.425790 Akaike info criterion -1.694263 Sum squared resid 106.4215 Schwarz criterion -1.635257 Log likelihood -332.2252 F-statistic 24.98361 Durbin-Watson stat 1.912059 Prob(F-statistic) 0.000000

LOGIT// Dependent Variable is UNION Sample: 1 595 Included observations: 595 Convergence achieved after 4 iterations

 Variable Coefficient Std. Error t-Statistic Prob. C 4.152595 1.288390 3.223088 0.0013 EX -0.011018 0.009641 -1.142863 0.2536 WKS -0.107116 0.021215 -5.049031 0.0000 OCC 1.684082 0.262193 6.423059 0.0000 SOUTH -1.043629 0.237769 -4.389255 0.0000 SMSA 0.459707 0.215149 2.136687 0.0330 MS 0.975711 0.272560 3.579800 0.0004 ED -0.100033 0.048507 -2.062229 0.0396 Log likelihood -314.2744 Obs with Dep=1 218 Obs with Dep=0 377 Variable Mean All Mean D= 1 Mean D= 0 C 1.000000 1.000000 1.000000 EX 22.85378 23.83028 22.28912 WKS 46.45210 45.27982 47.12997 OCC 0.512605 0.766055 0.366048 SOUTH 0.292437 0.197248 0.347480 SMSA 0.642017 0.646789 0.639257 MS 0.805042 0.866972 0.769231 ED 12.84538 11.84862 13.42175 PROBIT // Dependent Variable is UNION Sample: 1 595 Included observations: 595 Convergence achieved after 3 iterations Variable Coefficient Std. Error t-Statistic Prob. C 2.411706 0.741327 3.253228 0.0012 EX -0.006986 0.005715 -1.222444 0.2220 WKS -0.060491 0.011788 -5.131568 0.0000 OCC 0.971984 0.150538 6.456745 0.0000 SOUTH -0.580959 0.136344 -4.260988 0.0000 SMSA 0.273201 0.126988 2.151388 0.0319 MS 0.545824 0.155812 3.503105 0.0005 ED -0.063196 0.028464 -2.220210 0.0268 Log likelihood -315.1770 Obs with Dep=1 218 Obs with Dep=0 377

13.4 Occupation regression.

a. OLS Estimation

LS // Dependent Variable is OCC

Sample: 1 595

Included observations: 595

 Variable Coefficient Std. Error t-Statistic Prob. C 2.111943 0.182340 11.58245 0.0000 ED -0.111499 0.006108 -18.25569 0.0000 WKS -0.001510 0.003044 -0.496158 0.6200 EX -0.002870 0.001533 -1.872517 0.0616 SOUTH -0.068631 0.035332 -1.942452 0.0526 SMSA -0.079735 0.034096 -2.338528 0.0197 IND 0.091688 0.033693 2.721240 0.0067 MS 0.006271 0.056801 0.110402 0.9121 FEM -0.064045 0.070543 -0.907893 0.3643 BLK 0.068514 0.063283 1.082647 0.2794 R-squared 0.434196 Mean dependent var 0.512605 Adjusted R-squared 0.425491 S. D. dependent var 0.500262 S. E. of regression 0.379180 Akaike info criterion -1.922824 Sum squared resid 84.10987 Schwarz criterion -1.849067 Log likelihood -262.2283 F-statistic 49.88075 Durbin-Watson stat 1.876105 Prob(F-statistic) 0.000000

LOGIT ESTIMATION

LOGIT // Dependent Variable is OCC

Sample: 1 595

Included observations: 595

Convergence achieved after 5 iterations

 Variable Coefficient Std. Error t-Statistic Prob. C 11.62962 1.581601 7.353069 0.0000 ED -0.806320 0.070068 -11.50773 0.0000 WKS -0.008424 0.023511 -0.358297 0.7203 EX -0.017610 0.011161 -1.577893 0.1151 SOUTH -0.349960 0.260761 -1.342073 0.1801 SMSA -0.601945 0.247206 -2.434995 0.0152 IND 0.689620 0.241028 2.861157 0.0044

PROBIT ESTIMATION

PROBIT // Dependent Variable is OCC

Sample: 1 595

Included observations: 595

Convergence achieved after 4 iterations

 Variable Coefficient Std. Error t-Statistic Prob. C 6.416131 0.847427 7.571312 0.0000 ED -0.446740 0.034458 -12.96473 0.0000 WKS -0.003574 0.013258 -0.269581 0.7876 EX -0.010891 0.006336 -1.718878 0.0862 SOUTH -0.240756 0.147920 -1.627608 0.1041 SMSA -0.327948 0.139849 -2.345016 0.0194 IND 0.371434 0.135825 2.734658 0.0064 MS -0.097665 0.245069 -0.398522 0.6904 FEM -0.358948 0.296971 -1.208697 0.2273 BLK 0.215257 0.252219 0.853453 0.3938

Log likelihood -246.6581 Obs with Dep=1 305 Obs with Dep=0 290

13.5 Truncated Uniform Density.

 1 ■ Ї1 1 1 "3" Pr x > — = – dx = 2 У-1/2 2 2 _2_
 = – .So that 4

 f x/x > — =

 f(x)

1/2 2 1

і — Г, = —— = – tor—————– < x <1.

2) Pr[x > – i] 3/4 3 2

var(x) = E(x2) – (E(x))2 = E(x2) = 3

E(x2/x >- д=/-1/2×2 – 2 – dx=2 – 3[x3]L1/2=2

Therefore, as expected, truncation reduces the variance.

13.6 Truncated Normal Density.

a. From the Appendix, Eq. (A.1), using c = 1, p = 1, ct2 = 1 and Ф(0) = 2, we get, f(x/x >1) = ¥-x-(0}) = 2¥(x – 1) for x >1

Similarly, using Eq. (A.2), for c = 1, p = 1 and ct2 = 1 with Ф(0) = 3 we getf(x/x < 1) = ¥ф(0)1) = 2¥(x – 1) forx < 1

b. The conditional mean is given in (A.3) and for this example we get

c___ ^

with c* = = = 0. Similarly, using (A.4) we get,

о 1

¥(c*) ф(0) 2

E(x/x < 1) = 1 – 1 • ) = 1 – = 1 – 2ф(0) = 1 –

c. From (A.5) we get, var(x/x >1) = 1(1 – 8(c*)) = 1 – 8(0) where

2 4 2

= 2ф(0)[2ф(0)] = 4ф2(0) = = = 0.64 for x >1

2

From (A.6), we get var(x/x >1) = 1 – 8(0) where

Both conditional truncated variances are less than the unconditional var(x) = 1 .

13.7 Censored Normal Distribution.

a. From the Appendix we get,

E(y) = Pr[y = c] E(y/y = c) C Pr[y > c] E(y/y > c)

= cФ(c*) C (1 – Ф(c*))E(y*/y* > c)

Ф(c*)

1 – Ф(c*)_

where E(y*/y* > c) is obtained from the mean of a truncated normal density, see (A.3).

b. Using the result on conditional variance given in Chap. 2 we get, var(y) = E(conditional variance) C var(conditional mean). But

E(conditional variance) = P[y = c] var(y/y = c)CP[y > c] var(y/y > c)

= Ф(е*) • 0 + (1 – Ф(о*Х)ст2(1 – 8(c*)) where var(y/y > c) is given by (A.5).

var(conditional mean) = P[y = c] • (c — E(y))2 + Pr(y > c)[E(y/y>c)-E(y)]2 = Ф(c*)(c — E(y))2+[1 — Ф(^)][E(y/y > c) — E(y)]2

where E(y) is given by (A.7) and E(y/y > c) is given by (A.3). This gives

var(conditional mean) = Ф(^) fc — cФ(c*) — (1 — Ф^*))

as required. Similarly, from part (b), using c* = —fi/o and Ф(— fi/o) =

13.8 Fixed vs. adjustable mortgage rates. This is based on Dhillon et al. (1987).

a. The OLS regression of Y on all variables in the data set is given below. This was done using EViews. The R2 = 0.434 and the F-statistic for the significance of all slopes is equal to 3.169. This is distributed as F(15,62) under the null hypothesis. This has a p-value of 0.0007. Therefore, we reject Ho and we conclude that this is a significant regression. As explained in Sect. 13.6, using BRMR this also rejects the insignificance of all slopes in the logit specification.

Unrestricted Least Squares

LS // Dependent Variable is Y

Sample: 1 78

Included observations: 78

 Variable Coefficient Std. Error t-Statistic Prob. C 1.272832 1.411806 0.901563 0.3708 BA 0.000398 0.007307 0.054431 0.9568 BS 0.017084 0.020365 0.838887 0.4048 NW -0.036932 0.025320 -1.458609 0.1497 FI -0.221726 0.092813 -2.388949 0.0200 PTS 0.178963 0.091050 1.965544 0.0538 MAT 0.214264 0.202497 1.058108 0.2941 MOB 0.020963 0.009194 2.279984 0.0261 MC 0.189973 0.150816 1.259635 0.2125 FTB -0.013857 0.136127 -0.101797 0.9192 SE 0.188284 0.360196 0.522728 0.6030 YLD 0.656227 0.366117 1.792399 0.0779 MARG 0.129127 0.054840 2.354621 0.0217 CB 0.172202 0.137827 1.249403 0.2162 STL -0.001599 0.005994 -0.266823 0.7905 LA -0.001761 0.007801 -0.225725 0.8222

 R-squared 0.433996 Mean dependent var 0.589744 Adjusted R-squared 0.297059 S. D. dependent var 0.495064 S. E. of regression 0.415069 Akaike info criter -1.57794 Sum squared resid 10.6815 Schwarz criterion -1.09451 Log likelihood -33.1376 F-statistic 3.16932 Durbin-Watson stat 0.905968 Prob(F-statistic) 0.000702
 Plot of Y and YHAT

b. The URSS from part (a) is 10.6815 while the RRSS by including only the cost variables is 14.0180 as shown in the enclosed output from EViews. The Chow-F statistic for insignificance of 10 personal characteristics variables is

F= (14.0180 – 10.6815)/10

10.6815/62 ‘

which is distributed as F(10,62) under the null hypothesis. This has a 5% critical value of 1.99. Hence, we cannot reject Ho. The principal agent theory suggests that personal characteristics are important in making this mortgage choice. Briefly, this theory suggests that information is asym­metric and the borrower knows things about himself or herself that the lending institution does not. Not rejecting Ho does not provide support for the principal agent theory.

TESTING THE EFFICIENT MARKET HYPOTHESIS WITH THE LINEAR PROBABILITY MODEL

Restricted Least Squares

LS // Dependent Variable is Y

Sample: 1 78

Included observations: 78

 Variable Coefficient Std. Error t-Statistic Prob. FI -0.237228 0.078592 -3.018479 0.0035 MARG 0.127029 0.051496 2.466784 0.0160 YLD 0.889908 0.332037 2.680151 0.0091 PTS 0.054879 0.072165 0.760465 0.4495 MAT 0.069466 0.196727 0.353108 0.7250 C 1.856435 1.289797 1.439324 0.1544

 R-squared 0.257199 Mean dependent var 0.589744 Adjusted R-squared 0.205616 S. D. dependent var 0.495064 S. E. of regression 0.441242 Akaike info criter -1.56252 Sum squared resid 14.018 Schwarz criterion -1.38124 Log likelihood -43.7389 F-statistic 4.98609 Durbin-Watson stat 0.509361 Prob(F-statistic) 0.000562

c. The logit specification output using EViews is given below. The unre­stricted log-likelihood is equal to —30.8963. The restricted specification output is also given showing a restricted log-likelihood of —41.4729. Therefore, the LR test statistic is given by LR = 2(41.4729 — 30.8963/ = 21.1532 which is distributed as x20 under the null hypothesis. This is sig­nificant given that the 5% critical value of x20 is 18.31. This means that the logit specification does not reject the principal agent theory as personal characteristics are not jointly insignificant.

TESTING THE EFFICIENT MARKET HYPOTHESIS WITH THE LOGIT MODEL Unrestricted Logit Model

LOGIT // Dependent Variable is Y Sample: 1 78 Included observations: 78 Convergence achieved after 5 iterations

 Variable Coefficient Std. Error t-Statistic Prob. C 4.238872 10.47875 0.404521 0.6872 BA 0.010478 0.075692 0.138425 0.8904 BS 0.198251 0.172444 1.149658 0.2547 NW -0.244064 0.185027 -1.319072 0.1920 FI -1.717497 0.727707 -2.360149 0.0214 PTS 1.499799 0.719917 2.083294 0.0414 MAT 2.057067 1.631100 1.261153 0.2120 MOB 0.153078 0.097000 1.578129 0.1196 MC 1.922943 1.182932 1.625575 0.1091 FTB -0.110924 0.983688 -0.112763 0.9106 SE 2.208505 2.800907 0.788496 0.4334 YLD 4.626702 2.919634 1.584686 0.1181 MARG 1.189518 0.485433 2.450426 0.0171 CB 1.759744 1.242104 1.416744 0.1616 STL -0.031563 0.051720 -0.610265 0.5439 LA -0.022067 0.061013 -0.361675 0.7188

Log likelihood -30.89597 Obs with Dep=1 46

Obs with Dep=0 32

 Variable Mean All Mean D=1 Mean D=0 C 1.000000 1.000000 1.000000 BA 36.03846 35.52174 36.78125 BS 16.44872 15.58696 17.68750 NW 3.504013 2.075261 5.557844 FI 13.24936 13.02348 13.57406 PTS 1.497949 1.505217 1.487500 MAT 1.058333 1.027609 1.102500 MOB 4.205128 4.913043 3.187500 MC 0.602564 0.695652 0.468750 FTB 0.615385 0.521739 0.750000 SE 0.102564 0.043478 0.187500 YLD 1.606410 1.633261 1.567813 MARG 2.291923 2.526304 1.955000 CB 0.358974 0.478261 0.187500 STL 13.42218 11.72304 15.86469 LA 5.682692 4.792174 6.962812 Restricted Logit Model LOGIT // Dependent Variable is Y Sample: 1 78 Included observations: 78 Convergence achieved after 4 iterations Variable Coefficient Std. Error t-Statistic Prob. FI -1.264608 0.454050 -2.785172 0.0068 MARG 0.717847 0.313845 2.287265 0.0251 YLD 4.827537 1.958833 2.464497 0.0161 PTS 0.359033 0.423378 0.848019 0.3992 MAT 0.550320 1.036613 0.530883 0.5971 C 6.731755 7.059485 0.953576 0.3435 Log likelihood -41.47292 Obs with Dep=1 46 Obs with Dep=0 32 Variable Mean All Mean D=1 Mean D=0 FI 13.24936 13.02348 13.57406 MARG 2.291923 2.526304 1.955000 YLD 1.606410 1.633261 1.567813 PTS 1.497949 1.505217 1.487500 MAT 1.058333 1.027609 1.102500 C 1.000000 1.000000 1.000000

d. Similarly, the probit specification output using EViews is given below. The unrestricted log-likelihood is equal to —30.7294. The restricted log – likelihood is —41.7649. Therefore, the LR test statistic is given by LR = 2(41.7649 — 30.7294/ = 22.0710 which is distributed as x?0 under the null hypothesis. This is significant given that the 5% critical value of x20 is 18.31. This means that the probit specification does not reject the principal agent theory as personal characteristics are not jointly insignificant.

TESTING THE EFFICIENT MARKET HYPOTHESIS WITH THE PROBIT MODEL

Unrestricted Probit Model

PROBIT // Dependent Variable is Y Sample: 1 78 Included observations: 78 Convergence achieved after 5 iterations

 Variable Coefficien Std. Error t-Statistic Prob. C 3.107820 5.954673 0.521913 0.6036 BA 0.003978 0.044546 0.089293 0.9291 BS 0.108267 0.099172 1.091704 0.2792 NW -0.128775 0.103438 -1.244943 0.2178 FI -1.008080 0.418160 -2.410750 0.0189 PTS 0.830273 0.379895 2.185533 0.0326 MAT 1.164384 0.924018 1.260131 0.2123 MOB 0.093034 0.056047 1.659924 0.1020 MC 1.058577 0.653234 1.620518 0.1102 FTB -0.143447 0.550471 -0.260589 0.7953 SE 1.127523 1.565488 0.720237 0.4741 YLD 2.525122 1.590796 1.587332 0.1175 MARG 0.705238 0.276340 2.552069 0.0132 CB 1.066589 0.721403 1.478493 0.1443 STL -0.016130 0.029303 -0.550446 0.5840 LA -0.014615 0.035920 -0.406871 0.6855

Log likelihood -30.72937 Obs with Dep=1 46

Obs with Dep=0 32

Restricted Probit Model

PROBIT // Dependent Variable is Y Sample: 1 78 Included observations: 78 Convergence achieved after 3 iterations

 Variable Coefficient Std. Error t-Statistic Prob. FI -0.693584 0.244631 -2.835225 0.0059 MARG 0.419997 0.175012 2.399811 0.0190 YLD 2.730187 1.099487 2.483146 0.0154 PTS 0.235534 0.247390 0.952076 0.3442 MAT 0.221568 0.610572 0.362886 0.7178 C 3.536657 4.030251 0.877528 0.3831

Log likelihood -41.76443 Obs with Dep=1 46

Obs with Dep=0 32

13.13 Problem Drinking and Employment. The following Stata output replicates the OLS results given in Table 5 of Mullahy and Sindelar (1996, p. 428) for males. The first regression is for employment, given in column 1 of Table 5 of the paper, and the second regression is for unemployment, given in column 3 of Table 5 of the paper. Robust standard errors are reported.

. reg emp hvdrnk90 ue88 age agesq educ married famsize white hlstat1 hlstat2 hlstat3 hlstat4 region1 region2 region3 msa1 msa2 q1 q2 q3, robust

 Regression with robust standard errors Number of obs = 9822 F (20, 9801) = 46.15 Prob > F = 0.0000 R-squared = 0.1563 Root MSE = .27807

Robust

 emp | Coef. Std. Err. t P>|t| [95% Conf. Interval] hvdrnk90 | -.0155071 .0101891 -1.52 0.128 -.0354798 .0044657 ue88 | -.0090938 .0022494 -4.04 0.000 -.013503 -.0046846 age | .0162668 .0029248 5.56 0.000 .0105336 .0220001 agesq | -.0002164 .0000362 -5.98 0.000 -.0002873 -.0001455

 educ 0.0078258 0.0011271 6.94 0 0.0056166 0.0100351 married 0.0505682 0.0098396 5.14 0 0.0312805 0.0698558 famsize 0.0020612 0.0021796 0.95 0.344 -0.0022113 0.0063336 white 0.0773332 0.0104289 7.42 0 0.0568905 0.097776 hlstat1 0.57519 0.0306635 18.76 0 0.515083 0.635297 hlstat2 0.5728 0.0306427 18.69 0 0.512734 0.632866 hlstat3 0.537617 0.0308845 17.41 0 0.477077 0.598157 hlstat4 0.394739 0.0354291 11.14 0 0.325291 0.464187 region1 -0.0013608 0.0094193 -0.14 0.885 -0.0198247 0.017103 region2 0.0050446 0.0084215 0.6 0.549 -0.0114633 0.0215526 region3 0.0254332 0.0081999 3.1 0.002 0.0093596 0.0415067 msa1 -0.0159492 0.0083578 -1.91 0.056 -0.0323322 0.0004337 msa2 0.0073081 0.0072395 1.01 0.313 -0.0068827 0.0214989 q1 -0.0155891 0.0079415 -1.96 0.05 -0.0311561 -2.2e-05 q2 -0.0068915 0.0077786 -0.89 0.376 -0.0221392 0.0083561 q3 -0.0035867 0.0078474 -0.46 0.648 -0.0189692 0.0117957 _cons -0.0957667 0.0623045 -1.54 0.124 -0.217896 0.0263631

. reg unemp hvdrnk90 ue88 age agesq educ married famsize white hlstatl hlstat2 hlstat3 hlstat4 region1 region2 region3 msa1 msa2 q1 q2 q3, robust

 Regression with robust standard errors Number of obs = 9822 F(20, 9801) = 3.37 Prob > F = 0.0000 R-squared = 0.0099 Root MSE = .17577

 1 emp | Coef. Robust Std. Err. t P>|t| [95% Conf. Interval] hvdrnk90 | .0100022 .0066807 1.50 0.134 -.0030934 .0230977 ue88 | .0045029 .0014666 3.07 0.002 .0016281 .0073776 age | -.0014753 .0017288 -0.85 0.393 -.0048641 .0019134 agesq | .0000123 .0000206 0.60 0.551 -.0000281 .0000527 educ | -.0028141 .0006307 -4.46 0.000 -.0040504 -.0015777 married | -.0092854 .0060161 -1.54 0.123 -.0210782 .0025073 famsize | .0003859 .0013719 0.28 0.778 -.0023033 .0030751 white | -.0246801 .0063618 -3.88 0.000 -.0371506 -.0122096 hlstat1 j .0150194 .0113968 1.32 0.188 -.0073206 .0373594 hlstat2 | .0178594 .0114626 1.56 0.119 -.0046097 .0403285 hlstat3 | .0225153 .0116518 1.93 0.053 -.0003245 .0453552 hlstat4 | .0178865 .0136228 1.31 0.189 -.0088171 .0445901 region1 | .0007911 .005861 0.13 0.893 -.0106977 .01228 region2 | -.0029056 .0053543 -0.54 0.587 -.0134011 .0075898 region3 | -.0065005 .005095 -1.28 0.202 -.0164877 .0034868 msa1 | -.0008801 .0052004 -0.17 0.866 -.011074 .0093139

 msa2 I -0.0055184 0.0047189 -1.17 0.242 -0.0147685 0.0037317 q11 0.0145704 0.0051986 2.8 0.005 0.00438 0.0247607 q21 0.0022831 0.0047579 0.48 0.631 -0.0070434 0.0116096 q31 4.3e-05 0.0047504 0.01 0.993 -0.0092687 0.0093547 .cons j 0.0927746 0.0364578 2.54 0.011 0.0213098 0.164239

The following Stata output replicates the OLS results given in Table 6 of Mullahy and Sindelar (1996, p. 429) for females. The first regression is for employment, given in column 1 of Table 6 of the paper, and the second regres­sion is for unemployment, given in column 3 of Table 6 of the paper. Robust standard errors are reported.

. reg emp hvdrnk90 ue88 age agesq educ married famsize white hlstatl hlstat2 hlstat3 hlstat4 region1 region2 region3 msa1 msa2 q1 q2 q3, robust

 Regression with robust standard errors Number of obs = 12534 F(20,12513) = 117.99 Prob > F = 0.0000 R-squared = 0.1358 Root MSE = .42932

 j emp j Coef. Robust Std. Err. t P>jtj [95% Conf. Interval] hvdrnk90 j .0059878 .0120102 0.50 0.618 -.017554 .0295296 ue88 j -.0168969 .002911 -5.80 0.000 -.0226028 -.011191 age j .04635 .0036794 12.60 0.000 .0391378 .0535622 agesq j -.0005898 .0000449 -13.13 0.000 -.0006778 -.0005018 educ j .0227162 .0015509 14.65 0.000 .0196762 .0257563 married j .0105416 .0111463 0.95 0.344 -.0113068 .0323901 famsize j -.0662794 .0030445 -21.77 0.000 -.072247 -.0603118 white j -.0077594 .0104111 -0.75 0.456 -.0281668 .012648 hvdrnk90 j .0059878 .0120102 0.50 0.618 -.017554 .0295296 ue88 j -.0168969 .002911 -5.80 0.000 -.0226028 -.011191 age j .04635 .0036794 12.60 0.000 .0391378 .0535622 agesq j -.0005898 .0000449 -13.13 0.000 -.0006778 -.0005018 educ j .0227162 .0015509 14.65 0.000 .0196762 .0257563 married j .0105416 .0111463 0.95 0.344 -.0113068 .0323901 famsize j -.0662794 .0030445 -21.77 0.000 -.072247 -.0603118 white j -.0077594 .0104111 -0.75 0.456 -.0281668 .012648 hlstat1 j .4601695 .0253797 18.13 0.000 .4104214 .5099177 hlstat2 j .4583823 .0252973 18.12 0.000 .4087957 .5079689

 hlstat3 0.409624 0.0251983 16.26 0 0.360232 0.459017 hlstat4 0.249443 0.027846 8.96 0 0.19486 0.304025 region1 -0.0180596 0.0129489 -1.39 0.163 -0.0434415 0.0073223 region2 0.0095951 0.0114397 0.84 0.402 -0.0128285 0.0320186 region3 0.0465464 0.0108841 4.28 0 0.0252119 0.067881 msa1 -0.0256183 0.0109856 -2.33 0.02 -0.0471518 -0.0040848 msa2 0.0051885 0.0103385 0.5 0.616 -0.0150765 0.0254534 q1 -0.0058134 0.0107234 -0.54 0.588 -0.0268329 0.0152061 q2 -0.0061301 0.0109033 -0.56 0.574 -0.0275022 0.0152421 q3 -0.0168673 0.0109023 -1.55 0.122 -0.0382376 0.0045029 _cons -0.588292 0.0782545 -7.52 0 -0.741683 -0.434902

. reg unemp hvdrnk90 ue88 age agesq educ married famsize white hlstatl hlstat2 hlstat3 hlstat4 region1 region2 region3 msa1 msa2 q1 q2 q3, robust

Regression with robust standard errors Number of obs = 12534

F(20, 12513) = 5.99

Prob > F = 0.0000

R-squared = 0.0141

Root MSE = .18409

 1 emp | Coef. Robust Std. Err. t P>|t| [95% Conf. Interval] hvdrnk90 | .0149286 .0059782 2.50 0.013 .0032104 .0266468 ue88 | .0038119 .0013782 2.77 0.006 .0011105 .0065133 age | -.0013974 .0015439 -0.91 0.365 -.0044237 .0016289 agesq | 4.43e-06 .0000181 0.24 0.807 -.0000311 .00004 educ | -.0011631 .0006751 -1.72 0.085 -.0024865 .0001602 married | -.0066296 .0058847 -1.13 0.260 -.0181645 .0049053 famsize | .0013304 .0013075 1.02 0.309 -.0012325 .0038933 white | -.0308826 .0051866 -5.95 0.000 -.0410493 -.020716 hlstat1 j .008861 .0092209 0.96 0.337 -.0092135 .0269354 hlstat2 | .0079536 .0091305 0.87 0.384 -.0099435 .0258507 hlstat3 | .0224927 .0093356 2.41 0.016 .0041934 .0407919 hlstat4 | .0193116 .0106953 1.81 0.071 -.0016528 .040276 region1 | .0020325 .0055618 0.37 0.715 -.0088694 .0129344 region2 | -.0005405 .0049211 -0.11 0.913 -.0101866 .0091057 region3 | -.0079708 .0046818 -1.70 0.089 -.0171479 .0012063 msa1 | -.002055 .0049721 -0.41 0.679 -.0118011 .007691 msa2 | -.0130041 .0041938 -3.10 0.002 -.0212246 -.0047835 q1 | .0025441 .0043698 0.58 0.560 -.0060214 .0111095 q21 .0080984 .0046198 1.75 0.080 -.0009571 .0171539 q31 .0102601 .0046839 2.19 0.029 .001079 .0194413 _cons | .0922081 .0350856 2.63 0.009 .023435 .1609813

The corresponding probit equation for employment for males is given by the following stata output (this replicates Table 13.6 in the text): . probit emp hvdrnk90 ue88 age agesq educ married famsize white hlstatl hlstat2 hlstat3 hlstat4 region1 region2 region3 msa1 msa2 q1 q2 q3, robust

 Probit regression Number of obs = 9822 Wald chi2(20) = 928.34 Prob > chi2 = 0.0000 Log pseudolikelihood = -2698.1797 Pseudo R2 = 0.1651

 1 emp | Coef. Robust Std. Err. z P>|z| [95% Conf. Interval] hvdrnk90 | -.1049465 .0589878 -1.78 0.075 -.2205606 .0106675 ue88 | -.0532774 .0142024 -3.75 0.000 -.0811135 -.0254413 age | .0996338 .0171184 5.82 0.000 .0660824 .1331853 agesq | -.0013043 .0002051 -6.36 0.000 -.0017062 -.0009023 educ | .0471834 .0066738 7.07 0.000 .034103 .0602638 married | .2952921 .0540855 5.46 0.000 .1892866 .4012976 famsize | .0188906 .0140462 1.34 0.179 -.0086395 .0464206 white | .3945226 .0483378 8.16 0.000 .2997822 .489263 hlstat1 | 1.816306 .0983443 18.47 0.000 1.623554 2.009057 hlstat2 | 1.778434 .0991528 17.94 0.000 1.584098 1.97277 hlstat3 | 1.547836 .0982635 15.75 0.000 1.355244 1.740429 hlstat4 | 1.043363 .1077276 9.69 0.000 .8322209 1.254505 region1 | .0343123 .0620016 0.55 0.580 -.0872085 .1558331 region2 | .0604907 .0537881 1.12 0.261 -.044932 .1659135 region3 | .1821206 .0542342 3.36 0.001 .0758236 .2884176 msa1 | -.0730529 .0518715 -1.41 0.159 -.1747192 .0286134 msa2 | .0759533 .0513087 1.48 0.139 -.02461 .1765166 q11 -.1054844 .0527723 -2.00 0.046 -.2089162 -.0020525 q21 -.0513229 .052818 -0.97 0.331 -.1548444 .0521985 q31 -.0293419 .0543746 -0.54 0.589 -.1359142 .0772303 _cons | -3.017454 .3592294 -8.40 0.000 -3.72153 -2.313377

We can see how the probit model fits by looking at its predictions.

. estat classification Probit model for emp

 — True — Classified | D ~D I Total + I 8743 826 | 9569 – I 79 174 | 253 Total | 8822 1000 | 9822

Classified + if predicted Pr(D) >= .5 True D defined as emp!= 0

 Sensitivity Specificity Positive predictive value Negative predictive value Pr(+| D) Pr(-| ~D) Pr(D| +) Pr(~ D| -) 99.10% 17.40% 91.37% 68.77% False + rate for true ~D Pr(+| ~ D) 82.60% False – rate for true D Pr(-| D) 0.90% False + rate for classified + Pr(~D| +) 8.63% False – rate for classified – Pr(D| -) 31.23% Correctly classified 90.79%

We could have alternatively run a logit regression on employment for males. logit emp hvdrnk90 ue88 age agesq educ married famsize white hlstat1 hlstat2 hlstat3 hlstat4 region1 region2 region3 msa1 msa2 q1 q2 q3, robust

 Logistic regression Number of obs = 9822 Wald chi2(20) = 900.15 Prob > chi2 = 0.0000 Log pseudolikelihood = -2700.0567 Pseudo R2 = 0.1646

 | emp | Coef. Robust Std. Err. z P>|z| [95% Conf. Interval] hvdrnk90 | -.1960754 .1114946 -1.76 0.079 -.4146008 .02245 ue88 | -.1131074 .0273316 -4.14 0.000 -.1666764 -.0595384 age | .1884486 .0332284 5.67 0.000 .123322 .2535751 agesq | -.0024584 .0003965 -6.20 0.000 -.0032356 -.0016813 educ | .0913569 .0127978 7.14 0.000 .0662738 .1164401 married | .5534291 .1057963 5.23 0.000 .3460721 .760786 famsize | .0365059 .0276468 1.32 0.187 -.0176808 .0906927

 white 0.722404 0.0912559 7.92 0 0.543545 0.901262 hlstat1 3.14548 0.172192 18.27 0 2.80799 3.48297 hlstat2 3.06728 0.174129 17.61 0 2.72599 3.40857 hlstat3 2.61369 0.170742 15.31 0 2.27904 2.94834 hlstat4 1.72557 0.18449 9.35 0 1.36398 2.08717 region1 0.0493715 0.122007 0.4 0.686 -0.189757 0.2885 region2 0.114611 0.105504 1.09 0.277 -0.0921733 0.321395 region3 0.373827 0.106649 3.51 0 0.164799 0.582856 msa1 -0.16909 0.101646 -1.66 0.096 -0.368313 0.0301319 msa2 0.134597 0.102163 1.32 0.188 -0.0656374 0.334832 q1 -0.195453 0.10347 -1.89 0.059 -0.398251 0.0073453 q2 -0.105249 0.103301 -1.02 0.308 -0.307716 0.0972176 q3 -0.0418287 0.10749 -0.39 0.697 -0.252505 0.168847 .cons -5.53827 0.693538 -7.99 0 -6.89758 -4.17896

And the corresponding predictions for the logit model are given by

. estat classification Logistic model for emp

 …….. True – Classified | D ~D I Total + I 8740 822 | 9562 – I 82 178 | 260 Total | 8822 1000 | 9822

Classified + if predicted Pr(D) >= .5 True D defined as emp!=0

 Sensitivity Specificity Positive predictive value Negative predictive value Pr(+- D) Pr(—~D) Pr(D – +) Pr(~D – -) 99.07% 17.80% 91.40% 68.46% False + rate for true ~D Pr(+| ~D) 82.20% False – rate for true D Pr(–D) 0.93% False + rate for classified + Pr(~D – +) 8.60% False – rate for classified – Pr(D–) 31.54% Correctly classified 90.80%

The marginal effects for the probit model can be obtained as follows:

.dprobit emp hvdrnk90 ue88 age agesq educ married famsize white hlstatl hlstat2 hlstat3 hlstat4 region1 region2 region3 msa1 msa2 q1 q2 q3, robust

Iteration 0: log pseudolikelihood =-3231.8973

Iteration 1: log pseudolikelihood =-2707.0435

Iteration 2: log pseudolikelihood =-2698.2015

Iteration 3: log pseudolikelihood =-2698.1797

 Probit regression, reporting marginal effects Number of obs = 9822 Wald chi2(20) = 928.34 Prob > chi2 = 0.0000 Log pseudolikelihood = -2698.1797 Pseudo R2 = 0.1651

 | emp | dF/dx Robust Std. Err. z P>|z| x-bar [95% Conf. Interval] hvdrnk90[7] | -.0161704 .0096242 -1.78 0.075 .099165 -.035034 .002693 ue88 | -.0077362 .0020463 -3.75 0.000 5.56921 -.011747 -.003725 age | .0144674 .0024796 5.82 0.000 39.1757 .009607 .019327 agesq | -.0001894 .0000297 -6.36 0.000 1627.61 -.000248 -.000131 educ | .0068513 .0009621 7.07 0.000 13.3096 .004966 .008737 married* | .0488911 .010088 5.46 0.000 .816432 .029119 .068663 famsize | .002743 .002039 1.34 0.179 2.7415 -.001253 .006739 white* | .069445 .0100697 8.16 0.000 .853085 .049709 .089181 hlstat1 * | .2460794 .0148411 18.47 0.000 .415903 .216991 .275167 hlstat2* | .1842432 .0099207 17.94 0.000 .301873 .164799 .203687 hlstat3* | .130786 .0066051 15.75 0.000 .205254 .11784 .143732 hlstat4* | .0779836 .0041542 9.69 0.000 .053451 .069841 .086126 region1 * | .0049107 .0087468 0.55 0.580 .203014 -.012233 .022054 region2* | .0086088 .0075003 1.12 0.261 .265628 -.006092 .023309 region3* | .0252543 .0071469 3.36 0.001 .318265 .011247 .039262 msa1 * | -.0107946 .0077889 -1.41 0.159 .333232 -.026061 .004471 msa2* | .0109542 .0073524 1.48 0.139 .434942 -.003456 .025365 q1* | -.0158927 .0082451 -2.00 0.046 .254632 -.032053 .000267 q2* | -.0075883 .0079484 -0.97 0.331 .252698 -.023167 .00799 q3* | -.0043066 .0080689 -0.54 0.589 .242822 -.020121 .011508 obs. P | .8981877 pred. P | .9224487 (at x-bar)

13.15 Fertility and Female Labor Supply

a. Carrasco (2001, p. 391) Table 4, column 1, ran a fertility probit equation, which we replicate below using Stata:

probit f dsex ags26l educ_2 educ_3 age drace inc

 Probit regression Number of obs = 5768 LR chi2 (7) = 964.31 Prob > chi2 = 0.0000 Log likelihood = -1561.1312 Pseudo R2 = 0.2360

 f Coef. Std. Err. z P>|z| [95% Conf. Interval] dsex .3250503 .0602214 5.40 0.000 .2070184 .4430822 ags26l -2.135365 .1614783 -13.22 0.000 -2.451857 -1.818873 educ_2 .0278467 .1145118 0.24 0.808 -.1965922 .2522856 educ_3 .3071582 .1255317 2.45 0.014 .0611207 .5531958 age -.0808522 .0048563 -16.65 0.000 -.0903703 -.071334 drace -.0916409 .0629859 -1.45 0.146 -.215091 .0318093 inc .003161 .0029803 1.06 0.289 -.0026803 .0090022 _cons 1.526893 .1856654 8.22 0.000 1.162996 1.890791
 For part (b) the predicted probabilities are obtained as follows:

. Istat

Probit model for f

True-

 Classified D ~D Total + 2 3 5 – 654 5109 5763 Total 656 5112 5768

Classified + if predicted Pr(D) >= .5 True D defined as f!= 0

 Sensitivity Specificity Positive predictive value Negative predictive value Pr(+| D) Pr(-| ~D) Pr(D| +) Pr(~D| -) 0.30% 99.94% 40.00% 88.65% False + rate for true ~D Pr(+| ~D) 0.06% False – rate for true D Pr(-| D) 99.70% False + rate for classified + Pr(~D| +) 60.00% False – rate for classified – Pr( Dj -) 11.35% Correctly classified 88.61%

The estimates reveal that having children of the same sex has a significant and positive effect on the probability of having an additional child. The marginal effects are given by dprobit in Stata. dprobit f dsex ags26l educ_2 educ_3 age drace inc

 Probit regression, reporting marginal effects Number of obs = 5768 LR chi2 (7) = 964.31 Prob > chi2 = 0.0000 Log likelihood = —1561.1312 Pseudo R2 = 0.2360

 f dF/dx Std. Err. z P>|z| x-bar [95% C. I.] dsex* .0302835 .0069532 5.40 0.000 .256415 .016655 .043912 ags26l* -.1618148 .0066629 -13.22 0.000 .377601 -.174874 -.148756 educ_2* .0022157 .0090239 0.24 0.808 .717753 -.015471 .019902 educ_3* .0288636 .0140083 2.45 0.014 .223994 .001408 .056319 age -.0065031 .0007644 -16.65 0.000 32.8024 -.008001 -.005005 drace* -.0077119 .0055649 -1.45 0.146 .773232 -.018619 .003195 inc .0002542 .000241 1.06 0.289 12.8582 -.000218 .000727 obs. P .1137309 pred. P .0367557 (at x-bar)
 (*) dF/dx is for discrete change of dummy variable from 0 to 1 z and P> |z| correspond to the test of the underlying coefficient being 0

If we replace same sex by its components: same sex female and same sex male variables, the results do not change indicating that having both boys or girls does not matter, see Carrasco (2001,p.391) Table 4, column 2.

. probit f dsexm dsexf ags26l educ_2 educ_3 age drace inc

 Probit regression Number of obs = 5768 LR chi2 (8) = 964.32 Prob > chi2 = 0.0000 Log likelihood = -1561.1284 Pseudo R2 = 0.2360

 f Coef. Std. Err. z P>|z| [95% Conf. Interval] dsexm .328542 .0764336 4.30 0.000 .1787349 .4783491 dsexf .3209239 .0820417 3.91 0.000 .1601252 .4817226 ags26l -2.135421 .1614518 -13.23 0.000 -2.451861 -1.818981 educ_2 .027657 .1145384 0.24 0.809 -.1968342 .2521482 educ_3 .3068706 .1255904 2.44 0.015 .0607179 .5530233 age -.0808669 .0048605 -16.64 0.000 -.0903934 -.0713404 drace -.0918074 .0630233 -1.46 0.145 -.2153308 .031716 inc .0031709 .0029829 1.06 0.288 -.0026754 .0090173 _cons 1.527551 .1858818 8.22 0.000 1.163229 1.891872

Probit model for f

True-

 Classified D Total + 2 3 5 – 654 5109 5763 Total 656 5112 5768

Classified + if predicted Pr(D) >= .5 True D defined as f!= 0

 Sensitivity Specificity Positive predictive value Negative predictive value Pr(+| D) Pr(-| ~D) Pr(D| +) Pr(-D| -) 0.30% 99.94% 40.00% 88.65% False + rate for true —D Pr(+| ~D) 0.06% False – rate for true D Pr(-| D) 99.70% False + rate for classified + Pr(-D| +) 60.00% False – rate for classified – Pr(D| -) 11.35% Correctly classified 88.61%

. dprobit f dsexm dsexf ags26l educ_2 educ_3 age drace inc

 Probit regression, reporting marginal effects Number of obs = 5768 LR chi2 (7) = 964.32 Prob > chi2 = 0.0000 Log likelihood = —1561.1284 Pseudo R2 = 0.2360

 dF/dx Std. Err. z P>|z| x-bar [95% C. I.] dsexm[8] .0325965 .0095475 4.30 0.000 .145111 .013884 .051309 dsexf* .032261 .0103983 3.91 0.000 .111304 .011881 .052641 ags26l* -.16182 .0066634 -13.23 0.000 .377601 -.17488 -.14876 educ_2* .0022008 .0090273 0.24 0.809 .717753 -.015492 .019894 educ_3* .0288323 .01401 2.44 0.015 .223994 .001373 .056291 age -.0065042 .0007645 -16.64 0.000 32.8024 -.008003 -.005006 drace* -.0077266 .0055692 -1.46 0.145 .773232 -.018642 .003189 inc .000255 .0002412 1.06 0.288 12.8582 -.000218 .000728 obs. P .1137309 pred. P .0367556 (at x-bar)

c. Carrasco (2001, p. 392) Table 5, column 4, ran a female labor force participation OLS equation, which we replicate below using Stata 10:

. reg dhw f ags26l fxag26l educ_2 educ_3 age drace inc dhwl

 Number of obs = 5768 F(9, 5758) = 445.42 Prob > F = 0.0000 R-squared = 0.4104 Adj R-squared = 0.4095 Root MSE = .32361

 dhw Coef. Std. Err. t P>|t| [95% Conf. Interval] f -.0888995 .0144912 -6.13 0.000 -.1173077 -.0604912 ags26l -.0194454 .0093334 -2.08 0.037 -.0377424 -.0011484 fxag26l -.0581458 .1629414 -0.36 0.721 -.3775723 .2612806 educ_2 .0491989 .0186018 2.64 0.008 .0127323 .0856655 educ_3 .0725501 .0207404 3.50 0.000 .0318912 .1132091 age .0014193 .0007854 1.81 0.071 -.0001203 .002959 drace -.0098333 .010379 -0.95 0.343 -.03018 .0105134 inc -.0018149 .0004887 -3.71 0.000 -.002773 -.0008568 dhwl .6253973 .0103188 60.61 0.000 .6051686 .645626 _cons .2373022 .032744 7.25 0.000 .1731117 .3014927

Carrasco (2001, p. 392) Table 5, column 1, ran a female labor force participation probit equation, which we replicate below using Stata:

. probit dhw f ags26l fxag26l educ_2 educ_3 age drace inc dhwl

Number of obs = 5768

LR chi2 (7) = 2153.17

Prob > chi2 = 0.0000

Pseudo R2 = 0.3458

Log likelihood = -2036.8086

 dhw Coef. Std. Err. z P>|z| [95% Conf. Interval] f -.4103849 .0690538 -5.94 0.000 -.5457279 -.2750419 ags26l -.1064159 .0480907 -2.21 0.027 -.200672 -.0121598 fxag26l -.1886427 .7087803 -0.27 0.790 -1.577827 1.200541 educ_2 .2338264 .0858408 2.72 0.006 .0655816 .4020713 educ_3 .3773278 .1001949 3.77 0.000 .1809494 .5737062 age .0091203 .0041132 2.22 0.027 .0010586 .017182 drace -.0577508 .0542972 -1.06 0.288 -.1641714 .0486699 inc -.0088483 .0024217 -3.65 0.000 -.0135948 -.0041019 dhwl 1.932025 .0462191 41.80 0.000 1.841438 2.022613 _cons -.8540838 .1638299 -5.21 0.000 -1.175184 -.5329831

Probit model for dhw

 Classified D ~D Total + 4073 378 4451 – 366 951 1317 Total 4439 1329 5768

Classified + if predicted Pr(D) >= .5 True D defined as dhw!= 0

 Sensitivity Specificity Positive predictive value Negative predictive value Pr(+| D) Pr(-| ~D) Pr(D| +) Pr(~D| -) 91.75% 71.56% 91.51% 72.21% False + rate for true ~D Pr(+| ~D) 28.44% False – rate for true D Pr(- D) 8.25% False + rate for classified + Pr(~D| +) 8.49% False – rate for classified – Pr( Dj -) 27.79% Correctly classified 87.10%

The marginal effects are given by dprobit in Stata:

. dprobit dhw f ags26l fxag26l educ_2 educ_3 age drace inc dhwl

Number of obs = 5768

LR chi2 (9) = 2153.17

Prob > chi2 = 0.0000

Pseudo R2 = 0.3458

 dhw dF/dx Std. Err. z P>|z| x-bar [95% C. I.] f[9] -.1200392 .0224936 -5.94 0.000 .113731 -.164126 -.075953 ags26l* -.0275503 .0125892 -2.21 0.027 .377601 -.052225 -.002876 fxag26l* -.0524753 .2127127 -0.27 0.790 .000693 -.469385 .364434 educ_2* .0626367 .0239923 2.72 0.006 .717753 .015613 .109661 educ_3* .0870573 .0206089 3.77 0.000 .223994 .046665 .12745 age .0023327 .0010504 2.22 0.027 32.8024 .000274 .004391 drace* -.0145508 .0134701 -1.06 0.288 .773232 -.040952 .01185 inc -.0022631 .0006189 -3.65 0.000 12.8582 -.003476 -.00105 dhwl* .6249756 .0134883 41.80 0.000 .771671 .598539 .651412 obs. P .7695908 pred. P .8271351 (at x-bar)

d. The 2sls estimates in Table 5, column 5, of Carrasco (2001, p. 392) using as instruments the same sex variables and their interactions with ags26l is given below, along with the over-identification test and the first stage diagnostics:

. ivregress 2sls dhw (f fxag26l =dsexm dsexf sexm_26l sexf_26l) ags26l educ_2 e > duc_3 age drace inc dhwl

 Instrumental variables (2SLS) regression Number of obs = 5768 Wald chi2(9) = 3645.96 Prob > chi2 = 0 R-squared = 0.3565 Root MSE = 0.3378

 dhw Coef. Std. Err. z P>|z| [95% Conf. Interval] f -.2164685 .2246665 -0.96 0.335 -.6568067 .2238697 fxag26l -3.366305 3.512783 -0.96 0.338 -10.25123 3.518623 ags26l -.0385731 .0467522 -0.83 0.409 -.1302058 .0530596 educ_2 .0331807 .0288653 1.15 0.250 -.0233943 .0897557 educ_3 .064607 .0348694 1.85 0.064 -.0037357 .1329497 age -.0001934 .0030344 -0.06 0.949 -.0061407 .0057539 drace -.0163251 .012366 -1.32 0.187 -.0405621 .0079118 inc -.0017194 .0005162 -3.33 0.001 -.0027312 -.0007076 dhwl .6230639 .017256 36.11 0.000 .5892427 .6568851 _cons .3330965 .141537 2.35 0.019 .0556891 .610504

Instrumented: f fxag26l

Instruments: ags26l educ_2 educ_3 age drace inc dhwl dsexm dsexf sexm_26l sexf_26l

. estat overid

Tests of overidentifying restrictions:

Sargan (score) chi2(2) = .332468 (p = 0.8468) Basmann chi2(2) =.331796 (p = 0.8471)

. estat firststage

Shea’s partial R-squared

 Shea’s Shea’s Variable Partial R-sq. Adj. Partial R-sq. f 0.0045 0.0028 fxag26l 0.0023 0.0006

Minimum eigenvalue statistic = 3.36217

Critical Values # of endogenous regressors: 2

Ho: Instruments are weak # of excluded instruments: 4

 5% 10% 20% 30% 2SLS relative bias 11.04 7.56 5.57 4.73 10% 15% 20% 25% 2SLS Size of nominal 5% Wald test 16.87 9.93 7.54 6.28 LIML Size of nominal 5% Wald test 4.72 3.39 2.99 2.79

e. So far, heterogeneity across the individuals is not taken into account. Carrasco (2001, p. 393) Table 7, column 4, ran a female labor force par­ticipation fixed effects equation with robust standard errors, which we replicate below using Stata:

. xtreg dhw f ags26l fxag26l dhwl, fe r

 Fixed-effects (within) regression Number of obs = 5768 Group variable: ident Number of groups = 1442 R-sq: within = 0.0059 Obs per group: min =4 between = 0.6185 avg = 4.0 overall = 0.2046 max =4 F(4,4322) = 4.64 corr(u_i, Xb) = 0.4991 Prob > F = 0.0010
 (Std. Err. adjusted for clustering on ident)

 dhw Coef. Robust Std. Err. t P>|t| [95% Conf. Interval] f -.0547777 .0155326 -3.53 0.000 -.0852296 -.0243257 ags26l .0012836 .0126213 0.10 0.919 -.0234607 .0260279 fxag26l -.2204885 .2013721 -1.09 0.274 -.615281 .1743041 dhwl .0356233 .0236582 1.51 0.132 -.0107588 .0820055 _cons .7479995 .0193259 38.70 0.000 .7101108 .7858881 sigma_u sigma_e rho .33260036 .27830212 .58818535 (fraction of variance due to u_i)

Note that only fertility is significant in this equation.

Fixed effects 2sls using as instruments the same sex variables and their interactions with ags26l is given below: . xtivreg dhw (f fxag26l =dsexm dsexf sexm_26l sexf_26l)ags26l age inc dhwl, fe

 Fixed-effects (within) IV regression Number of obs = 5768 Group variable: ident Number of groups = 1442 R-sq: within = . Obs per group: min = 4 between = 0.1125 avg = 4 overall = 0.0332 max = 4 Wald chi2(6) = 39710.3 corr(u_i, Xb) = 0.0882 Prob > chi2 = 0

 dhw Coef. Std. Err. z P>|z| [95% Conf. Interval] f -.2970225 .156909 -1.89 0.058 -.6045584 .0105134 fxag26l -2.1887 2.433852 -0.90 0.369 -6.958963 2.581562 ags26l -.0584866 .0467667 -1.25 0.211 -.1501476 .0331744 age .000651 .0043265 0.15 0.880 -.0078287 .0091307 inc -.0011213 .0010482 -1.07 0.285 -.0031758 .0009331 dhwl .0362943 .0160524 2.26 0.024 .0048322 .0677565 _cons .7920305 .1623108 4.88 0.000 .4739071 1.110154 sigma_u .3293446 sigma_e .29336161 rho .55759255 (fraction of variance due to u. i) F test that all u_ =0: F(1441,4320) = 2.16 Prob > F = 0.0000

Instrumented: f fxag26l

Instruments: ags26l age inc dhwl dsexm dsexf sexm_26l sexf_26l

13.16 multinomial logit model

a. Table II of Terza (2002, p. 399) columns 3,4, 9 and 10 are replicated below for the male data using Stata:

. mlogit y alc90th ue88 age agesq schooling married famsize white excellent verygood good fair northeast midwest south centercity othermsa q1 q2 q3, baseoutcome(1)

 У Coef. Std. Err. z P>|z| [95% Conf. Interval] 2 alc90th .1270931 .21395 0.59 0.552 -.2922412 .5464274 ue88 .0458099 .051355 0.89 0.372 -.0548441 .1464639 age .1617634 .0663205 2.44 0.015 .0317776 .2917492 agesq -.0024377 .0007991 -3.05 0.002 -.004004 -.0008714 schooling -.0092135 .0245172 -0.38 0.707 -.0572664 .0388393 married .4004928 .1927458 2.08 0.038 .022718 .7782677 famsize .0622453 .0503686 1.24 0.217 -.0364753 .1609659 white .0391309 .1705625 0.23 0.819 -.2951653 .3734272 excellent 2.91833 .4486757 6.50 0.000 2.038942 3.797719 verygood 2.978336 .4505932 6.61 0.000 2.09519 3.861483 good 2.493939 .4446815 5.61 0.000 1.622379 3.365499 fair 1.460263 .4817231 3.03 0.002 .5161027 2.404422 northeast .0849125 .2374365 0.36 0.721 -.3804545 .5502796 midwest .0158816 .2037486 0.08 0.938 -.3834583 .4152215 south .1750244 .2027444 0.86 0.388 -.2223474 .5723962 centercity -.2717445 .1911074 -1.42 0.155 -.6463081 .1028192 othermsa -.0921566 .1929076 -0.48 0.633 -.4702486 .2859354 q1 .422405 .1978767 2.13 0.033 .0345738 .8102362 q2 -.0219499 .2056751 -0.11 0.915 -.4250657 .3811659 q3 -.0365295 .2109049 -0.17 0.862 -.4498954 .3768364 _cons -6.113244 1.427325 -4.28 0.000 -8.910749 -3.315739 3 alc90th -.1534987 .1395003 -1.10 0.271 -.4269144 .1199169 ue88 -.0954848 .033631 -2.84 0.005 -.1614004 -.0295693 age .227164 .0409884 5.54 0.000 .1468282 .3074999 agesq -.0030796 .0004813 -6.40 0.000 -.0040228 -.0021363 schooling .0890537 .0152314 5.85 0.000 .0592008 .1189067 married .7085708 .1219565 5.81 0.000 .4695405 .9476012 famsize .0622447 .0332365 1.87 0.061 -.0028975 .127387 white .7380044 .1083131 6.81 0.000 .5257147 .9502941 excellent 3.702792 .1852415 19.99 0.000 3.339725 4.065858 verygood 3.653313 .1894137 19.29 0.000 3.282069 4.024557 good 2.99946 .1786747 16.79 0.000 2.649264 3.349656 fair 1.876172 .1885159 9.95 0.000 1.506688 2.245657 northeast .088966 .1491191 0.60 0.551 -.203302 .3812341 midwest .1230169 .1294376 0.95 0.342 -.130676 .3767099 south .4393047 .1298054 3.38 0.001 .1848908 .6937185 centercity -.2689532 .1231083 -2.18 0.029 -.510241 -.0276654 othermsa .0978701 .1257623 0.78 0.436 -.1486195 .3443598 q1 -.0274086 .1286695 -0.21 0.831 -.2795961 .224779 q2 -.110751 .126176 -0.88 0.380 -.3580514 .1365494 q3 -.0530835 .1296053 -0.41 0.682 -.3071052 .2009382 _cons -6.237275 .8886698 -7.02 0.000 -7.979036 -4.495515
 (y==1 is the base outcome)

**using bootstrap for the var-cov matrix

. mlogit y alc90th ue88 age agesq schooling married famsize white excellent ver > ygood good fair northeast midwest south centercity othermsa q1 q2 q3, baseout > come(1) vce(bootstrap)

(running mlogit on estimation sample)

Bootstrap replications (50)

—-+—1 —+— 2 —+— 3 —+— 4 —+— 5 ………………………. 50

 Multinomial logistic regression Number of obs = 9822 Replications = 50 Wald chi2 (40) = 7442.69 Prob > chi2 = 0 Log likelihood = -3217.481 Pseudo R2 = 0.1655

 Observed Bookstrap Normal-based y Coef. Std. Err. z P>|z| [95% Conf. Interval]

2

 alc90th 0.127093 0.193302 0.66 0.511 -0.251771 0.505957 ue88 0.0458099 0.0566344 0.81 0.419 -0.0651914 0.156811 age 0.161763 0.0615543 2.63 0.009 0.0411192 0.282408 agesq -0.0024377 0.0007266 -3.35 0.001 -0.0038619 -0.0010135 schooling -0.0092135 0.0249799 -0.37 0.712 -0.0581732 0.0397462 married 0.400493 0.206909 1.94 0.053 -0.0050409 0.806027 famsize 0.0622453 0.0534164 1.17 0.244 -0.0424489 0.166939 white 0.0391309 0.181705 0.22 0.829 -0.317005 0.395267 excellent 2.91833 0.513426 5.68 0 1.91203 3.92463 verygood 2.97834 0.547385 5.44 0 1.90548 4.05119 good 2.49394 0.490497 5.08 0 1.53258 3.4553 fair 1.46026 0.515618 2.83 0.005 0.44967 2.47085 northeast 0.0849125 0.205846 0.41 0.68 -0.318538 0.488363 midwest 0.0158816 0.197601 0.08 0.936 -0.371409 0.403172 south 0.175024 0.221141 0.79 0.429 -0.258403 0.608452 centercity -0.271744 0.170802 -1.59 0.112 -0.606511 0.0630218 othermsa -0.0921566 0.191577 -0.48 0.63 -0.467641 0.283328 q1 0.422405 0.239231 1.77 0.077 -0.0464783 0.891288 q2 -0.0219499 0.240471 -0.09 0.927 -0.493265 0.449365 q3 -0.0365295 0.250005 -0.15 0.884 -0.526529 0.45347 .cons -6.11324 1.25945 -4.85 0 -8.58172 -3.64477

3

 alc90th -0.153499 0.112998 -1.36 0.174 -0.374971 0.0679739 ue88 -0.0954848 0.0349536 -2.73 0.006 -0.163993 -0.026977 age 0.227164 0.0431 5.27 0 0.14269 0.311638 agesq -0.0030796 0.0005224 -5.9 0 -0.0041034 -0.0020558 schooling 0.0890537 0.0173814 5.12 0 0.0549868 0.123121 married 0.708571 0.128608 5.51 0 0.456503 0.960639 famsize 0.0622447 0.0361903 1.72 0.085 -0.0086869 0.133176 white 0.738004 0.132021 5.59 0 0.479249 0.99676 excellent 3.70279 0.201961 18.33 0 3.30696 4.09863 verygood 3.65331 0.209009 17.48 0 3.24366 4.06296 good 2.99946 0.205379 14.6 0 2.59693 3.402 fair 1.87617 0.2063 9.09 0 1.47183 2.28051 northeast 0.088966 0.162443 0.55 0.584 -0.229416 0.407348 midwest 0.123017 0.141045 0.87 0.383 -0.153427 0.399461 south 0.439305 0.134008 3.28 0.001 0.176655 0.701955 centercity -0.268953 0.098325 -2.74 0.006 -0.461667 -0.0762398 othermsa 0.0978701 0.106778 0.92 0.359 -0.111412 0.307152 q1 -0.0274086 0.120696 -0.23 0.82 -0.263969 0.209152 q2 -0.110751 0.130347 -0.85 0.396 -0.366226 0.144724 q3 -0.0530835 0.132973 -0.4 0.69 -0.313705 0.207538 .cons -6.23728 0.822403 -7.58 0 -7.84915 -4.6254

(y==1 is the base outcome)

**using robust for the var-cov matrix

. mlogit y alc90th ue88 age agesq schooling married famsize white excellent ver > ygood good fair northeast midwest south centercity othermsa q1 q2 q3, baseout > come(1) vce(robust)

Iteration 0: log pseudolikelihood = -3855.7148 Iteration 1: log pseudolikelihood = -3692.5753 Iteration 2: log pseudolikelihood = -3526.5092 Iteration 3: log pseudolikelihood = -3236.3918 Iteration 4: log pseudolikelihood = -3219.1826 Iteration 5: log pseudolikelihood = -3217.5569 Iteration 6: log pseudolikelihood = -3217.4813 Iteration 7: log pseudolikelihood = -3217.481

Multinomial logistic regression Number of obs = 9822

Wald chi2 (40) = 1075.69

Prob > chi2 = 0.0000

Log pseudolikelihood = —3217.481 Pseudo R2 = 0.1655

Robust

 y Coef. Std. Err. z P>|z| [95% Conf. Interval] 2 alc90th .1270931 .2152878 0.59 0.555 -.2948632 .5490494 ue88 .0458099 .0500181 0.92 0.360 -.0522238 .1438436 age .1617634 .0668732 2.42 0.016 .0306944 .2928324 agesq -.0024377 .0008087 -3.01 0.003 -.0040227 -.0008527 schooling -.0092135 .0234188 -0.39 0.694 -.0551135 .0366864 married .4004928 .204195 1.96 0.050 .0002779 .8007078 famsize .0622453 .0517847 1.20 0.229 -.0392509 .1637416 white .0391309 .1711588 0.23 0.819 -.2963342 .3745961 excellent 2.91833 .4548999 6.42 0.000 2.026743 3.809918 verygood 2.978336 .4566665 6.52 0.000 2.083286 3.873386 good 2.493939 .4507366 5.53 0.000 1.610511 3.377366 fair 1.460263 .48807 2.99 0.003 .5036629 2.416862 northeast .0849125 .23845 0.36 0.722 -.382441 .552266 midwest .0158816 .2044175 0.08 0.938 -.3847694 .4165326 south .1750244 .2022599 0.87 0.387 -.2213977 .5714466 centercity -.2717445 .1911311 -1.42 0.155 -.6463546 .1028656 othermsa -.0921566 .1955115 -0.47 0.637 -.475352 .2910389 q1 .422405 .1970871 2.14 0.032 .0361213 .8086887 q2 -.0219499 .2049964 -0.11 0.915 -.4237355 .3798357 q3 -.0365295 .2109886 -0.17 0.863 -.4500595 .3770005 _cons -6.113244 1.412512 -4.33 0.000 -8.881717 -3.344771 3 alc90th -.1534987 .1392906 -1.10 0.270 -.4265033 .1195059 ue88 -.0954848 .0335442 -2.85 0.004 -.1612303 -.0297394 age .227164 .0411389 5.52 0.000 .1465333 .3077948 agesq -.0030796 .000487 -6.32 0.000 -.004034 -.0021251 schooling .0890537 .0160584 5.55 0.000 .0575798 .1205276 married .7085708 .1315325 5.39 0.000 .4507719 .9663698 famsize .0622447 .035511 1.75 0.080 -.0073556 .1318451 white .7380044 .1139831 6.47 0.000 .5146017 .9614071 excellent 3.702792 .190178 19.47 0.000 3.33005 4.075534 verygood 3.653313 .1929514 18.93 0.000 3.275135 4.03149 good 2.99946 .1849776 16.22 0.000 2.636911 3.36201 fair 1.876172 .1956878 9.59 0.000 1.492631 2.259713 northeast .088966 .1505301 0.59 0.555 -.2060675 .3839996 midwest .1230169 .1302651 0.94 0.345 -.1322981 .3783319 south .4393047 .1341061 3.28 0.001 .1764616 .7021478 centercity -.2689532 .1266976 -2.12 0.034 -.5172758 -.0206306 othermsa .0978701 .1275274 0.77 0.443 -.152079 .3478193 q1 -.0274086 .1288453 -0.21 0.832 -.2799406 .2251235 q2 -.110751 .12602 -0.88 0.379 -.3577457 .1362437 q3 -.0530835 .1321321 -0.40 0.688 -.3120576 .2058907 _cons -6.237275 .8601993 -7.25 0.000 -7.923235 -4.551316

(y==1 is the base outcome)

b. For the female data, the multinomial logit estimates yield:

. mlogit y alc90th ue88 age agesq schooling married famsize white excellent verygood good fair northeast midwest south centercity othermsa q1 q2 q3,

 – Coef. Std. Err. z P>|z| [95% Conf. Interval] 2 alc90th -.1241993 .2365754 -0.52 0.600 -.5878785 .3394799 ue88 -.001862 .0514214 -0.04 0.971 -.1026462 .0989221 age -.0392239 .0612728 -0.64 0.522 -.1593164 .0808687 agesq .0004834 .0007411 0.65 0.514 -.0009691 .0019359 schooling -.0121174 .0254645 -0.48 0.634 -.0620269 .037792 married .0117958 .2220045 0.05 0.958 -.423325 .4469167 famsize .0092434 .0495871 0.19 0.852 -.0879456 .1064324 white .2817941 .1935931 1.46 0.146 -.0976414 .6612296 excellent .0420423 .4579618 0.09 0.927 -.8555463 .939631 verygood .0449091 .4574373 0.10 0.922 -.8516516 .9414698 good .0182444 .4544742 0.04 0.968 -.8725086 .9089974 fair .2925131 .4839658 0.60 0.546 -.6560424 1.241069 northeast -.1721726 .2163151 -0.80 0.426 -.5961425 .2517973 midwest -.2643294 .1944624 -1.36 0.174 -.6454687 .11681 south -.0161982 .1814209 -0.09 0.929 -.3717766 .3393803 centercity -.0812978 .1869101 -0.43 0.664 -.447635 .2850393 othermsa .044578 .1738872 0.26 0.798 -.2962347 .3853908 q1 22.30515 1.328553 16.79 0.000 19.70123 24.90906 q2 22.24068 1.32893 16.74 0.000 19.63603 24.84534 q3 18.65596 1.360765 13.71 0.000 15.98891 21.32301 _cons -23.50938 3 alc90th .1288509 .0812475 1.59 0.113 -.0303912 .2880931 ue88 .0148758 .0188237 0.79 0.429 -.022018 .0517696 age .0175243 .0230613 0.76 0.447 -.0276751 .0627236 agesq -.0002381 .00028 -0.85 0.395 -.0007868 .0003106 schooling .0035127 .0095824 0.37 0.714 -.0152685 .0222939 married -.0997914 .078327 -1.27 0.203 -.2533095 .0537266 famsize .0027002 .0184619 0.15 0.884 -.0334844 .0388849 white -.0277798 .066196 -0.42 0.675 -.1575217 .1019621 excellent -.1178398 .1636194 -0.72 0.471 -.4385278 .2028483 verygood -.1170045 .1633395 -0.72 0.474 -.437144 .203135 good -.1144024 .1622966 -0.70 0.481 -.4324979 .2036931 fair -.0344312 .1775054 -0.19 0.846 -.3823353 .3134729 northeast -.0548967 .0819514 -0.67 0.503 -.2155184 .105725 midwest .0572296 .0720545 0.79 0.427 -.0839946 .1984538

(y==1 is the base outcome)

13.17 Tobit estimation of Married Women Labor Supply

a. A detailed summary of the hours of work show that mean hours of work is
741, the median is 288, the minimum is zero and the maximum is 4950.

. sum hours, detail

hours worked, 1975

 Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 753 25% 0 0 Sum of Wgt. 753 50% 288 Mean 740.5764 Largest Std. Dev. 871.3142 75% 1516 3640 90% 1984 3686 Variance 759188.5 95% 2100 4210 Skewness .9225315 99% 3087 4950 Kurtosis 3.193949

b. Using the notation of solution 11.31, OLS on this model yields

. reg hours nwifeinc kidslt6 kidsge6 ‘control’ ‘E’, r

Linear regression Number of obs = 753

F(7,745) = 45.81

Prob > F = 0.0000

R-squared = 0.2656

Root MSE = 750.18

 hours Coef. Robust Std. Err. t P>|t| [95% Conf. Interval] nwifeinc -3.446636 2.240662 -1.54 0.124 -7.845398 .9521268 kidslt6 -442.0899 57.46384 -7.69 0.000 -554.9002 -329.2796 kidsge6 -32.77923 22.80238 -1.44 0.151 -77.5438 11.98535 age -30.51163 4.244791 -7.19 0.000 -38.84481 -22.17846 educ 28.76112 13.03905 2.21 0.028 3.163468 54.35878 exper 65.67251 10.79419 6.08 0.000 44.48186 86.86316 expersq -.7004939 .3720129 -1.88 0.060 -1.430812 .0298245 _cons 1330.482 274.8776 4.84 0.000 790.8556 1870.109
 Tobit estimation with left censoring at zero is represented by the option ll(0)

. tobit hours nwifeinc kidslt6 kidsge6 ‘control’ ‘E’, ll(0)

 Tobit regression Number of obs = 753 LR chi2 (7) = 271.59 Prob > chi2 = 0.0000 Log likelihood = -3819.0946 Pseudo R2 = 0.0343

 hours Coef. Std. Err. t P>|t| [95% Conf. Interval] nwifeinc -8.814243 4.459096 -1.98 0.048 -17.56811 -.0603724 kidslt6 -894.0217 111.8779 -7.99 0.000 -1113.655 -674.3887 kidsge6 -16.218 38.64136 -0.42 0.675 -92.07675 59.64075 age -54.40501 7.418496 -7.33 0.000 -68.96862 -39.8414 educ 80.64561 21.58322 3.74 0.000 38.27453 123.0167 exper 131.5643 17.27938 7.61 0.000 97.64231 165.4863 expersq -1.864158 .5376615 -3.47 0.001 -2.919667 -.8086479 _cons 965.3053 446.4358 2.16 0.031 88.88528 1841.725 /sigma 1122.022 41.57903 1040.396 1203.647

Obs. summary: 325 left-censored observations at hours<=0

428 uncensored observations 0 right-censored observations

c. This replicates Table 17.1 of Wooldridge (2009, p. 585) using Stata

. reg inlf nwifeinc kidslt6 kidsge6 ‘control’ ‘E’, r

Linear regression Number of obs = 753

F(7,745) = 62.48

Prob > F = 0.0000

R-squared = 0.2642

Root MSE = .42713

 inlf Coef. Robust Std. Err. t P>|t| [95% Conf. Interval] nwifeinc -.0034052 .0015249 -2.23 0.026 -.0063988 -.0004115 kidslt6 -.2618105 .0317832 -8.24 0.000 -.3242058 -.1994152 kidsge6 .0130122 .0135329 0.96 0.337 -.013555 .0395795 age -.0160908 .002399 -6.71 0.000 -.0208004 -.0113812 educ .0379953 .007266 5.23 0.000 .023731 .0522596 exper .0394924 .00581 6.80 0.000 .0280864 .0508983 expersq -.0005963 .00019 -3.14 0.002 -.0009693 -.0002233 _cons .5855192 .1522599 3.85 0.000 .2866098 .8844287

The Logit estimates yield:

. logit inlf nwifeinc kidslt6 kidsge6 ‘control’ ‘E’, r Iteration 0: log pseudolikelihood = -514.8732 Iteration 1: log pseudolikelihood = -402.38502 Iteration 2: log pseudolikelihood = -401.76569 Iteration 3: log pseudolikelihood = -401.76515 Iteration 4: log pseudolikelihood = -401.76515

 Logistic regression Number of obs = 753 Wald chi2 (7) = 158.48 Prob > chi2 = 0.0000 Log pseudolikelihood = -401.76515 Pseudo R2 = 0.2197

 inlf Coef. Robust Std. Err. z P>|z| [95% Conf. Interval] nwifeinc -.0213452 .0090782 -2.35 0.019 -.039138 -.0035523 kidslt6 -1.443354 .2031615 -7.10 0.000 -1.841543 -1.045165 kidsge6 .0601122 .0798825 0.75 0.452 -.0964546 .2166791 age -.0880244 .0144393 -6.10 0.000 -.1163248 -.0597239 educ .2211704 .0444509 4.98 0.000 .1340482 .3082925 exper .2058695 .0322914 6.38 0.000 .1425796 .2691594 expersq -.0031541 .0010124 -3.12 0.002 -.0051384 -.0011698 _cons .4254524 .8597308 0.49 0.621 -1.259589 2.110494

. estat classification

Logistic model for inlf

 Classified D ~D Total + 347 118 465 – 81 207 288
 True-

 Total

 428

 325 I 753

Classified + if predicted Pr(D) >= .5 True D defined as inlf!= 0

 Sensitivity Specificity Positive predictive value Negative predictive value Pr(+I D) Pr(-| ~D) Pr(D| +) Pr(~D| -) 81.07% 63.69% 74.62% 71.88% False + rate for true ~D Pr(+I ~D) 36.31% False – rate for true D Pr(- D) 18.93% False + rate for classified + Pr(~D| +) 25.38% False – rate for classified – Pr( Dj -) 28.13% Correctly classified 73.57%
 . mfx

Marginal effects after logit y = Pr(inlf) (predict)

= .58277201

 variable dy/dx Std. Err. z P>|z| [95% C. I ] X nwifeinc -.0051901 .00221 -2.35 0.019 -.009523 – .000857 20.129 kidslt6 -.3509498 .04988 -7.04 0.000 -.448718 .253182 .237716 kidsge6 .0146162 .01941 0.75 0.451 -.023428 .05266 1.35325 age -.021403 .00353 -6.07 0.000 -.028317 .014489 42.5378 educ .0537773 .01086 4.95 0.000 .032498 .075057 12.2869 exper .0500569 .00788 6.35 0.000 .034604 .06551 10.6308 expersq -.0007669 .00025 -3.11 0.002 -.001251 .000283 178.039 . margeff

 Average partial effects after logit y = Pr(inlf) variable Coef. Std. Err. z P>jzj [95% Conf. Interval] nwifeinc -.0038118 .0015923 -2.39 0.017 -.0069327 -.0006909 kidslt6 -.240805 .0262576 -9.17 0.000 -.292269 -.189341 kidsge6 .0107335 .0142337 0.75 0.451 -.017164 .038631 age -.0157153 .0023842 -6.59 0.000 -.0203883 -.0110423 educ .0394323 .0074566 5.29 0.000 .0248176 .0540471 exper .0367123 .0051935 7.07 0.000 .0265332 .0468914 expersq -.0005633 .0001767 -3.19 0.001 -.0009096 -.0002169

. probit inlf nwifeinc kidslt6 kidsge6 ‘control’ ‘E’, r Iteration 0: log pseudolikelihood = -514.8732 Iteration 1: log pseudolikelihood = -402.06651 Iteration 2: log pseudolikelihood = -401.30273 Iteration 3: log pseudolikelihood = -401.30219 Iteration 4: log pseudolikelihood = -401.30219

 Probit regression Number of obs = 753 Wald chi2 (7) = 185.10 Prob > chi2 = 0.0000 Log pseudolikelihood = -401.30219 Pseudo R2 = 0.2206

 inlf Coef. Robust Std. Err. z P>|z| [95% Conf. Interval] nwifeinc -.0120237 .0053106 -2.26 0.024 -.0224323 -.0016152 kidslt6 -.8683285 .1162037 -7.47 0.000 -1.096084 -.6405735 kidsge6 .036005 .0452958 0.79 0.427 -.0527731 .124783 age -.0528527 .0083532 -6.33 0.000 -.0692246 -.0364807 educ .1309047 .0258192 5.07 0.000 .0803 .1815095 exper .1233476 .0188537 6.54 0.000 .086395 .1603002 expersq -.0018871 .0006007 -3.14 0.002 -.0030645 -.0007097 _cons .2700768 .505175 0.53 0.593 -.7200481 1.260202 . mfx Marginal effects after probit y = Pr(inlf) (predict) = .58154201 variable dy/dx Std. Err. z P>|z| [95% C. I.] X nwifeinc -.0046962 .00208 -2.26 0.024 -.008766 -.000626 20.129 kidslt6 -.3391514 .04565 -7.43 0.000 -.428628 -.249675 .237716 kidsge6 .0140628 .01769 0.80 0.427 -.020603 .048729 1.35325 age -.0206432 .00327 -6.31 0.000 -.027056 -.014231 42.5378 educ .0511287 .01011 5.06 0.000 .031308 .07095 12.2869 exper .0481771 .00739 6.52 0.000 .033694 .06266 10.6308 expersq -.0007371 .00024 -3.14 0.002 -.001198 -.000276 178.039
 margeff

Average partial effects after probit y = Pr(inlf)

 Variable Coef. Std. Err. z P>|z| [95% Conf. Interval] nwifeinc -.0036162 .0015759 -2.29 0.022 -.0067049 -.0005275 kidslt6 -.2441788 .0257356 -9.49 0.000 -.2946198 -.1937379 kidsge6 .0108274 .0135967 0.80 0.426 -.0158217 .0374765 age -.0158917 .0023447 -6.78 0.000 -.0204873 -.011296 educ .0393088 .0073669 5.34 0.000 .02487 .0537476 exper .037046 .0051959 7.13 0.000 .0268621 .0472299 expersq -.0005675 .0001775 -3.20 0.001 -.0009154 -.0002197 . dprobit inlf nwifeinc kidslt6 kidsge6 ‘control’ ‘E’, r Iteration 0: log pseudolikelihood = -514.8732 Iteration 1: log pseudolikelihood = -405.78215 Iteration 2: log pseudolikelihood = -401.32924 Iteration 3: log pseudolikelihood = -401.30219 Iteration 4: log pseudolikelihood = -401.30219 Probit regression, reporting marginal effects Number of obs = 753 Wald chi2 (7) = 185.10 Prob > chi2 = 0.0000 Log pseudolikelihood = —401.30219 Pseudo R2 = 0.2206 Robust inlf dF/dx Std. Err. z P>|z| x-bar [ 95% C. I. ] nwifeinc -.0046962 .0020767 -2.26 0.024 20.129 -.008766 -.000626 kidslt6 -.3391514 .045652 -7.47 0.000 .237716 -.428628 -.249675 kidsge6 .0140628 .0176869 0.79 0.427 1.35325 -.020603 .048729 age -.0206432 .0032717 -6.33 0.000 42.5378 -.027056 -.014231 educ .0511287 .010113 5.07 0.000 12.2869 .031308 .07095 exper .0481771 .0073896 6.54 0.000 10.6308 .033694 .06266 expersq -.0007371 .000235 -3.14 0.002 178.039 -.001198 -.000276 obs. P .5683931 pred. P .581542 (at x-bar)
 z and P> |z| correspond to the test of the underlying coefficient being 0

. estat classification

Probit model for inlf

True-

 Classified D Total + 348 120 468 – 80 205 285 Total 428 325 753

Classified + if predicted Pr(D) >= .5 True D defined as inlf!= 0

 Sensitivity Specificity Positive predictive value Negative predictive value Pr(+| D) Pr(-| ~D) Pr(D| +) Pr(-D| -) 81.31% 63.08% 74.36% 71.93% False + rate for true —D Pr(+| ~D) 36.92% False – rate for true D Pr(-| D) 18.69% False + rate for classified + Pr(-D| +) 25.64% False – rate for classified – Pr(D| -) 28.07% Correctly classified 73.44%

d. Wooldridge (2009, Chapter 17) recommends one obtain the estimates of (fi/a2) from a probit using an indicator of labor force participation. Then comparing those with the Tobit estimates generated by dividing fi by a2. If these estimates are different or have different signs, then the Tobit esti­mation may not be appropriate. Part (c) gave such probit estimates. For (kidslt6) this was estimated at —0.868. From part (b) the tobit estimation gave a fi estimate for (kidslt6) of —894 and an estimate of a2 of 1122. The resulting estimate of (fi/a2) is —0.797. These have the same sign but with different magnitudes.

13.18 Heckit Estimation of Married Women’s Earnings

a. OLS on this model yields

. reg Iwage educ exper expersq

 Source I SS df MS Number of obs = 428 F( 3, 424) = 26.29 Model 35.0222967 3 11.6740989 Prob > F = 0.0000 Residual 188.305144 424 .444115906 R-squared = 0.1568 Adj R-squared = 0.1509 Total | 223.327441 427 .523015084 Root MSE = .66642 Iwage Coef. Std. Err. t P>|t| [95% Conf. Interval] educ .1074896 .0141465 7.60 0.000 .0796837 .1352956 exper .0415665 .0131752 3.15 0.002 .0156697 .0674633 expersq -.0008112 .0003932 -2.06 0.040 -.0015841 -.0000382 _cons -.5220406 .1986321 -2.63 0.009 -.9124667 -.1316144 Heckman two-step estimates . heckman lwage educ exper expersq, select ( educ exper expersq age kids kidsge6 nwifeinc) twostep Heckman selection model – two-step estimates Number of obs = 753 (regression model with sample selection) Censored obs = 325 Uncensored obs = 428 Wald chi2(3) = 51.53 Prob > chi2 = 0.0000 lwage Coef. Std. Err. z P>|z| [95% Conf. Interval] lwage educ .1090655 .015523 7.03 0.000 .0786411 .13949 exper .0438873 .0162611 2.70 0.007 .0120163 .0757584 expersq -.0008591 .0004389 -1.96 0.050 -.0017194 1.15e-06 _cons -.5781032 .3050062 -1.90 0.058 -1.175904 .019698 select educ .1309047 .0252542 5.18 0.000 .0814074 .180402 exper .1233476 .0187164 6.59 0.000 .0866641 .1600311 expersq -.0018871 .0006 -3.15 0.002 -.003063 -.0007111 age -.0528527 .0084772 -6.23 0.000 -.0694678 -.0362376 kidslt6 -.8683285 .1185223 -7.33 0.000 -1.100628 -.636029 kidsge6 .036005 .0434768 0.83 0.408 -.049208 .1212179 nwifeinc -.0120237 .0048398 -2.48 0.013 -.0215096 -.0025378 _cons .2700768 .508593 0.53 0.595 -.7267473 1.266901 mills lambda .0322619 .1336246 0.24 0.809 -.2296376 .2941613 rho 0.04861 sigma .66362875 lambda .03226186 .1336246

b. The inverse mills ratio coefficient lambda is estimated to be.032 with a standard error of 0.134 which is not significant. This does not reject the

null hypothesis of no sample selection.

c. The MLE of this Heckman (1976) sample selection model.

. heckman Iwage educ exper expersq, select (educ exper expersq age kidslt6 kidsge6 nwifeinc)

Iteration 0: log likelihood = -832.89776 Iteration 1: log likelihood = -832.88509 Iteration 2: log likelihood = -832.88508

 Heckman selection model Number of obs = 753 (regression model with sample selection) Censored obs = 325 Uncensored obs = 428 Wald chi2(3) = 59.67 Log likelihood = -832.8851 Prob > chi2 = 0

 lwage Coef. Std. Err. z P>|z| [95% Conf. Interval] lwage educ .1083502 .0148607 7.29 0.000 .0792238 .1374767 exper .0428369 .0148785 2.88 0.004 .0136755 .0719983 expersq -.0008374 .0004175 -2.01 0.045 -.0016556 -.0000192 _cons -.5526973 .2603784 -2.12 0.034 -1.06303 -.0423651 select educ .1313415 .0253823 5.17 0.000 .0815931 .1810899 exper .1232818 .0187242 6.58 0.000 .0865831 .1599806 expersq -.0018863 .0006004 -3.14 0.002 -.003063 -.0007095 age -.0528287 .0084792 -6.23 0.000 -.0694476 -.0362098 kidslt6 -.8673988 .1186509 -7.31 0.000 -1.09995 -.6348472 kidsge6 .0358723 .0434753 0.83 0.409 -.0493377 .1210824 nwifeinc -.0121321 .0048767 -2.49 0.013 -.0216903 -.002574 _cons .2664491 .5089578 0.52 0.601 -.7310898 1.263988 /athrho .026614 .147182 0.18 0.857 -.2618573 .3150854 /lnsigma -.4103809 .0342291 -11.99 0.000 -.4774687 -.3432931 rho sigma lambda .0266078 .6633975 .0176515 .1470778 .0227075 .0976057 -.2560319 .6203517 -.1736521 .3050564 .7094303 .2089552 LR test of indep. eqns. (rho = 0): chi2(1) = 0.03 Prob > chi2 = 0.8577

This yields the same results as the two-step Heckman procedure and the LR test for (rho = 0) is not significant.

References

Carrasco, R. (2001), “Binary Choice with Binary Endogenous Regressors in Panel Data: Estimating the Effect Fertility on Female Labor Participation,” Journal of Business & Economic Statistics, 19: 385-394.

Dhillon, U. S., J. D. Shilling and C. F. Sirmans (1987), “Choosing Between Fixed and Adjustable Rate Mortgages,” Journal of Money, Credit and Banking, 19: 260-267.

Heckman, J. (1976), “The Common Structure of Statistical Models of Truncation, Sample Selection, and Limited Dependent Variables and a Simple Estimator for Such Models,” Annals of Economic and Social Measurement, 5: 475-492.

Mullahy, J. and J. Sindelar (1996), “Employment, Unemployment, and Problem Drinking,” Journal of Health Economics, 15: 409-434.

Terza, J. (2002), “Alcohol Abuse and Employment: A Second Look,” Journal of Applied Econometrics, 17: 393-404.

Wooldridge, J. M. (2009), Introductory Econometrics: A Modern Approach (South­Western: Ohio).

CHAPTER 14