# Individual Data: Probit and Logit

When the number of observations ni in each group is small, one cannot obtain reliable esti­mates of the n^s with the p^s. In this case, one should not group the observations, instead these observations should be treated as individual observations and the model estimated by the maximum likelihood procedure. The likelihood is obtained as independent random draws from a Bernoulli distribution with probability of success ni = F(х’в = P[y = 1]. Hence

I = II?=i[F(х’в)Р[1 – F(х’ф)]1-Ш (13.15)

and the log-likelihood

log^ = En=iiVilogF(x’iв) + (1 – yi)log[1 – F(х’ф)]} (13.16)

The first-order conditions for maximization require the score S(в) = dlogl/дв to be zero:

S(в) = dlog^/дв = n=i{[/iVi/Fi] – (1 – Vi)[/i/(1 – Fi)]}xi (13.17)

= E"=i(Vi – Fi)/iXi/[Fi(1 – Fi)] = 0

where the subscript i on / or F denotes that the argument of that function is хів. For the logit model (13.17) reduces to

S(в) = Ei=1(Vi – Лі)хі = 0 since /i = Лі(1 – Лі) (13.18)

If there is a constant in the model, the solution to (13.18) for хі = 1 implies that En=1 Vi = En=1 – Лі. This means that the number of participants in the sample, i. e., those with Vi = 1, will always be equal to the predicted number of participants from the logit model. Similarly, if хі, is a dummy variable which is 1 if the individual is male and zero if the individual is female, then (13.18) states that the predicted frequency is equal to the actual frequency for males and females. Note that (13.18) resembles the OLS normal equations if we interpret (Vi – Лі) as residuals. For the probit model (13.17) reduces to

S(в) = Ei=i(Vi – Фі)&хі/[Фі(1 – Фі)] (13.19)

= E№=0 ^оіхі + Eyi=i ^1ixi = 0

where oi = – фі/[1 – Фі] for Vi = 0 and ii = фі/Фі for Vi = 1. Also, Fyi=0 denotes the sum over all zero values of Vi. These Ai’s are thought of as generalized residuals which are orthogonal to хі. Note that unlike the logit, the probit does not necessarily predict the number of participants to be exactly equal to the number of ones in the sample.

Equations (13.17) are highly nonlinear and may be solved using the scoring method, i. e., starting with some initial value в0 we revise this estimate as follows:

в i = во + [/ – Vo)]S (во) (13.20)

where S^) = dlog^/дв and I(в) = E[-d2log^/двдв’}. This process is repeated until conver­gence. For the logit and probit models, logF(х, ів) and log[1 – F(х’ф)] are concave. Hence, the log-likelihood function given by (13.16) is globally concave, see Pratt (1981). Hence, for both the logit and probit, [д2^.£/двдв’] is negative definite for all values of в and the iterative procedure will converge to the unique maximum likelihood estimate вmle no matter what starting values we use. In this case, the asymptotic covariance matrix of вMLE is estimated by I-1(eMLE) from the last iteration.

Amemiya (1981, p. 1495) derived I(в) by differentiating (13.17), multiplying by a negative sign and taking the expected value, the result is given by:

 I (в) = — К[д2^/двдв’] = £П=1 f2xix’i/Fi (1 — Fi) (13.21) For the logit, (13.21) reduces to I (в) = En=1 Лі (1 — Лі^І (13.22) For the probit, (13.21) reduces to I (в) = E n=1 ^2xixi/\$i(1 — фі) (13.23)

Alternative maximization may use the Newton-Raphson iterative procedure which uses the Hessian itself rather than its expected value in (13.20), i. e., I(в) is replaced by H(в) = [—d2logi/dede’]- For the logit model, H(в) = I(в) and is given in (13.22). For the probit model, H(в) = E П=1 [A2 + ХіХІв^іх’і which is different from (13.23). Note that i = Aoi if yi = 0; and Ai = A1i if yi = 1. These were defined below (13.19).

A third method, suggested by Berndt, Hall, Hall and Hausman (1974) uses the outer product of the first derivatives in place of I (в), i. e., О(в) = S (в) S'(в). For the logit model, this is С(в) = Y^i=1(yi — Лі)2xix’i. For the probit model, С(в) = EП=1 A22xix’i. As in the method of scoring, one iterates starting from initial estimates в0, and the asymptotic variance-covariance matrix is estimated from the inverse of С(в), H(в) or I(в) in the last iteration.

Test of hypotheses can be carried out from the asymptotic standard errors using t-statistics. For Ев = r type restrictions, the usual Wald test W = (Rf3 — r)'[RV(в)Е’]-1(Е/3 — r) can be used with V(в) obtained from the last iteration as described above. Likelihood ratio and Lagrange Multiplier statistics can also be computed. LR = —2[log£restricted— log£unrestricted], whereas, the Lagrange Multiplier statistic is LM = S’^)V(в)S(в), where S(в) is the score evaluated at the restricted estimator. Davidson and MacKinnon (1984) suggest that V(в) based on I(в) is the best of the three estimators to use. In fact, Monte Carlo experiments show that the estimate of V(в) based on the outer product of the first derivatives usually performs the worst and is not recommended in practice. All three statistics are asymptotically equivalent and are asymptotically distributed as x2 where q is the number of restrictions. The next section discusses tests of hypotheses using an artificial regression.