# Multinomial Probit Model

Let Uj, j = 0, 1, 2,. . ., m, be the stochastic utility associated with the yth alternative for a particular individual. By the multinomial probit model, we mean the model in which {Uj) are jointly normally distributed. Such a model was first proposed by Aitchison and Bennett (1970). This model has rarely been used in practice until recently because of its computational difficulty (except, of course, when m = 1, in which case the model is reduced to a binary
probit model, which is discussed in Section 9.2). To illustrate the complexity of the problem, consider the case of m = 2. Then, to evaluate P(y = 2), for example, we must calculate the multiple integral  P(y~L) = P(U2 > Ul, U2> U0)

where / is a trivariate normal density. The direct computation of such an integral is involved even for the case of m — 3.9 Moreover, (9,3.72) must be evaluated at each step of an iterative method to maximize the likelihood function.

Hausman and Wise (1978) estimated a trichotomous probit model to ex­plain the modal choice among driving own car, sharing rides, and riding a bus, for 557 workers in Washington, D. C. They specified the utilities by Utj = Ai log xUJ + pa log X2lj + Pa^r + e„,

where xu x2, x3, and x4 represent in-vehicle time, out-of-vehicle time, in­come, and cost, respectively. They assume that (Д,, pa, Аз,_e0)_are_indepen – dently (with each other) normally distributed with means (A. A> A> 0) and variances (cr?, a, a, 1). It is assumed that ei}are independent both through і and j and As are independent through z; therefore correlation between Utj and Uik occurs because of the same As appearing for all the alternatives. Hausman and Wise evaluated integrals of the form (9.3.72) using series expansion and noted that the method is feasible for a model with up to five alternatives.

Hausman and Wise also used two other models to analyze the same data: Multinomial Logit (derived from Eq. 9.3.73 by assuming that As are nonsto­chastic and are independently distributed as Type I extreme-value distribu­tion) and Independent Probit (derived from the model defined in the preced­ing paragraph by putting o = ог2 = a — 0). We shall call the original model Nonindependent Probit. The conclusions of Hausman and Wise are that (1) Logit and Independent Probit give similar results both in estimation and in the forecast of the probability of using a new mode; (2) Nonindependent Probit differs significantly from the other two models both in estimation and the forecast about the new mode; and (3) Nonindependent Probit fits best. The likelihood ratio test rejects Independent Probit in favor of Nonindepen­dent Probit at the 7% significance level. These results appear promising for further development of the model.

Albright, Lerman, and Manski (1977) (see also Lerman and Manski, 1981) developed a computer program for calculating the ML estimator (by a gra­dient iterative method) of a multinomial probit model similar to, but slightly more general than, the model of Hausman and Wise. Their specification is

U^x’jfr + e’j, (9.3.74)

where Д. ~ N(P, and є, = (єю, єп>. . ., eim) ~ N(0, 2e). As in Hausman and Wise, A and 6, are assumed to be independent of each other and indepen­dent through z. A certain normalization is employed on the parameters to make them identifiable.

An interesting feature of their computer program is that it gives the user the option of calculating the probability in the form (9.3.72) at each step of the iteration either (1) by simulation or (2) by Clark’s approximation (Clark, 1961), rather than by series expansion. The authors claim that their program can handle as many as ten alternatives. The simulation works as follows: Consider evaluating (9.3.72), for example. We artificially generate many ob­servations on U0, Ux, and U2 according to/evaluated at particular parameter values and simply estimate the probability by the observed frequency. Clark’s method is based on a normal approximation of the distribution of max (X, Y) when both X and Y are normally distributed. The exact mean and variance of max (X, Y), which can be evaluated easily, are used in the approximation. Albright, Lerman, and Manski performed a Monte Carlo study, which showed Clark’s method to be quite accurate. However, several other authors have contradicted this conclusion, saying that Clark’s approximation can be quite erratic in some cases (see, for example, Horowitz, Sparmann, and Da – ganzo, 1982).

Albright et al. applied their probit model to the same data that Hausman and Wise used, and they estimated the parameters of their model by Clark’s method. Their model is more general than the model of Hausman and Wise, and their independent variables contained additional variables such as mode – specific dummies and the number of automobiles in the household. They also estimated an independent logit model. Their conclusions were as follows:

1. Their probit and logit estimates did not differ by much. (They compared the raw estimates rather than comparing dP/dx for each independent variable.)

2. They could not obtain accurate estimates of in their probit model.

3. Based on the Akaike Information Criterion (Section 4.5.2), an increase in log L in their probit model as compared to their logit model was not large enough to compensate for the loss in degrees of freedom.

4. The logit model took 4.5 seconds per iteration, whereas the probit model took 60-75 CPU seconds per iteration on an IBM 370 Model 158. Thus, though they demonstrated the feasibility of their probit model, the gain of using the probit model over the independent logit did not seem to justify the added cost for this particular data set.

The discrepancy between the conclusions of Hausman and Wise and those of Albright et al. is probably due to the fact that Hausman and Wise imposed certain zero specifications on the covariance matrix, an observation that suggests that covariance specification plays a crucial role in this type of model.