# Multinomial Model

We illustrate the multinomial model by considering the case of three alternatives, which for convenience we associate with three integers 1, 2, and 3. One example of the three-response model is the commuter’s choice of mode of transportation, where the three alternatives are private car, bus, and train. Another example is the worker’s choice of three types of employment: being fully employed, partially employed, and self-employed.

We extend (13.5.2) to the case of three alternatives as (13.5.8) Uu = x’u 0 + uu

Ub = X2; P + щі

Ubi = Хзі (З + uSi,

where (щі, Щі, иы)are i. i.d. It is assumed that the individual chooses the alternative with the largest utility. Therefore, if we represent the ith per­son’s discrete choice by the variable yv our model is defined by

Р(Уі = 1) = P(UU > U2i, UU > U3i)

Р{Уі = 2) = P{U2i > Ulu U2і > U3i), 1,2,… ,n.

If we specify the joint distribution of (щ„ w2„ u3i) up to an unknown parameter vector 0, we can express the above probabilities as a function of P and 0. If we define binary variables y^ by y}i = 1 if = j, j = 1, 2, the likelihood function of the model is given by

П

(13.5.10) L = П PuУи Рп Ьі (1 – Pm ~ Р2і)1~Уи~У2і,

i= і

where Pu = Р(уі = 1) and P2; = P(ji = 2). An iterative method must be used for maximizing the above with respect to p and 0.

One way to specify the distribution of the u’s would be to assume them to be joindy normal. We can assume without loss of generality that their means are zeroes and one of the variances is unity. The former assumption is possible because the nonstochastic part can absorb nonzero means, and the latter because multiplication of the three utilities by an identical positive constant does not change their ranking. We should generally allow for nonzero correlation among the three error terms. An analogous model based on the normality assumption was estimated by Hausman and Wise (1978). In the normal model we must evaluate the probabilities as definite integrals of a joint normal density. This is cumbersome if the number of alternatives is larger than five, although an advance in the simulation method (see McFadden, 1989) has made the problem more manageable than formerly.

McFadden (1974) proposed a joint distribution of the errors that makes possible an explicit representation of the probabilities. He assumed that
the errors are mutually independent (in addition to being independent across г) and that each is distributed as

(13.5.11) F(u) = exp(—e~u).  This was called the Type I extreme-value distribution by Johnson and Kotz (1970, p. 272). The probabilities are explicitly given by

j = 1, 2, 3; г = 1, 2, . . . , n.

This model is called the multinomial logit model. Besides the advantage of having explicit formulae for the probabilities, this model has the compu­tational advantage of a globally concave likelihood function.

It is easy to criticize the multinomial logit model from a theoretical point of view. First, no economist is ready to argue that the utility should be distributed according to the Type I extreme-value distribution. Second, the model implies independence from irrelevant alternatives, which can be mathematically stated as

(13.5.13) P(US > I/j) = P(US >UlU3>U2 or Ui > U2)

and similar equalities involving the two other possible pairs of utilities. (We have suppressed the subscript і above to simplify the notation.) The equality (13.5.13) means that the information that a person has not chosen alternative 2 does not alter the probability that the person prefers 3 to 1. Let us consider whether or not this assumption is reasonable in the two examples we mentioned at the beginning of this section.

In the first example, suppose that alternatives 1, 2, and 3 correspond to bus, train, and private car, respectively, and suppose that a person is known to have chosen either bus or car. It is perhaps reasonable to surmise that the nonselection of train indicates the person’s dislike of public transportation. Given this information, we might expect her to be more likely to choose car over bus. If this reasoning is correct, we should expect inequality < to hold in the place of equality in (13.5.13). This argument would be more convincing if alternatives 1 and 2 corresponded to blue bus and red bus, instead of bus and train, to cite McFadden’s well-known example. Given that a person has not chosen red bus, it is likely that she will also prefer car to blue bus (unless she happens to abhor the color red).

In the second example, suppose that alternatives 1, 2, and 3 correspond to fully employed, partially employed, and self-employed. Again, we would expect inequality < in (13.5.13), to the extent that the nonselection of “partially employed” can be taken to mean an aversion to work for others.

If, however, we view (13.5.12) as a purely statistical model, not necessar­ily derived from utility maximization, it is much more general than it appears, precisely for the same reason that the choice of F does not matter much in (13.5.1) as long as the researcher experiments with various transformations of the independent variables. Any multinomial model can be approximated by a multinomial logit model if the researcher is allowed to manipulate the nonstochastic parts of the utilities.

It is possible to generalize the multinomial logit model in such a way that the assumption of independence from irrelevant alternatives is re­moved, yet the probabilities can be explicitly derived. We shall explain the nested logit model proposed by McFadden (1977) in the model of three alternatives. Suppose that u3 is distributed as (13.5.11) and independent of щ and щ, but щ and u2 follow the joint distribution

(13.5.14) F(u, u2) = exp{— [e Ul/,p + e “2//p]p), 0 < p £ 1.

The joint distribution was named Gumbel’s Type В bivariate extreme-value distribution by Johnson and Kotz (1972, p. 256). By taking either щ or щ to infinity, we can readily see that each marginal distribution is the same as (13.5.11). The parameter p measures the (inverse) degree of association between щ and щ such that p = 1 implies independence. Clearly, if p = 1 the model is reduced to the multinomial logit model. Therefore it is useful to estimate this model and test the hypothesis p = 1.

In a given practical problem the researcher must choose a priori which two alternatives should be paired in the nested logit model. In the afore­mentioned examples, it is natural to pair bus and train or fully employed and partially employed.

For generalization of the nested logit model to the case of more than three alternatives and to the case of higher-level nesting, see McFadden (1981) or Amemiya (1985, sections 9.3.5 and 9.3.6).

The probabilities of the above three-response nested logit model are specified by (13.5.15) Р(уі = 1 I у{ = 1 or 2) = A[(x1; – xa)’p/p] and

(13.5.16) Р(уі = 1 or 2) = Л[(х2г – хзі)’Р + p logz,-], where

z. = ехр[(хи – x2i)’P/p] + 1.