# Multinomial Logit Model

In this and the subsequent subsections we shall present various types of unor­dered multinomial QR models. The multinomial logit model is defined by

ptj = [Д exp (x’*)?)j exp (9.3.34)

і = 1, 2,. . ., n and j = 0, 1,. . ., m„

where we can assume x№ = 0 without loss of generality. The log likelihood function is given by

n mi

log L=’2i’2yijogPij. (9.3.35)

1-І j-0

Following McFadden (1974), we shall show the global concavity of (9.3.35). Differentiating (9.3.35) with respect to ft, we obtain d log L _ ^ y„ dPjj

dfi IfPijdp’

where 2, and 2, denote 2JL, and 2j20> respectively. Differentiating (9.3.36) further yields d2 log Т__,_ Уд (РРд 1 dPudPti 1 (9.3.40)

which, interestingly, does not depend on ytj. Because Ptj > 0 in this model, the matrix (9.3.40) is negative definite unless (x0- — Х/)’а — 0 for every і and j for some а Ф 0. Because such an event is extremely unlikely, we can conclude for all practical purposes that the log likelihood function is globally concave in the multinomial logit model.

We shall now discuss an important result of McFadden (1974), which shows how the multinomial logit model can be derived from utility maximization. Consider for simplicity an individual і whose utilities associated with three alternatives are given by  Uy = n, j + €,j, j—0, 1, and 2,

where Hij is a nonstochastic function of explanatory variables and unknown parameters and etj is an unobservable random variable. (In the following discussion, we shall write €, for to simplify the notation.) Thus (9.3.41) is analogous to (9.2.4) and (9.2.5). As in Example 9.2.2, it is assumed that the individual chooses the alternative for which the associated utility is highest. McFadden proved that the multinomial logit model is derived from utility maximization if and only if {e,} are independent and the distribution function of €j is given by exp [—exp (—ey)]. This is called the Type I extreme-value distribution, or log Weibull distribution, by Johnson and Kotz (1970, p. 272), who have given many more results about the distribution than are given here. Its density is given by exp (— e7) exp [— exp (e,)], which has a unique mode at 0 and a mean of approximately 0.577. We shall give only a proof of the ifpart.

Denoting the density given in the preceding paragraph by/(•), we can write
the probability the rth person chooses alternative j as (suppressing the subscript і from Цу as well as from ец)

Р(Уі = 2) = P(Ua > Ua, Ua > Ua) (9.3.42)

= P(e2 + ji2 – Ml > e,, 62 + ц2 – flo > Co)

f °° Г Ґ f «2 +№-Л) "I

= J /(e2) [J Леї) <*1 ‘ J /(eo) *oJ *2

= J exp (- e2) exp [~ exp (- €2)]

X exp [- exp (- e2 – 1*2 + Ml)]

X exp [- exp (- 62 – М2 + Mo)] de2

=__________ exp (Мд)_________

exp (Мда) + exp (m, i) + exp (Ma)’

Expression (9.3.42) is equal to Pa given in (9.3.34) if we put Ma — M« = xafi and Ma — Мю = x,’i P – The expressions for / *> and Pn can be similarly derived.

Example 9.3.2. As an application of the multinomial logit model, consider the following hypothetical model of transport modal choice. We assume that the utilities associated with three alternatives—car, bus, and train (corre­sponding to the subscripts 0,1, and 2, respectively)—are given by (9.3.41). As in (9.2.4) and (9.2.5), we assume

My = a + z^ + w<y, (9.3.43)

where x, j is a vector of the mode characteristics and w,- is a vector of the rth person’s socioeconomic characteristics. It is assumed that a, fi, and у are constant for all / and j. Then we obtain the multinomial logit model (9.3.34) with m = 2 if we put xa = za — гю and xn = zn — zm.

The fact that fi is constant for all the modes makes this model useful in predicting the demand for a certain new mode that comes into existence. Suppose that an estimate fi of has been obtained in the model with three modes (Example 9.3.2) and that the characteristics zl3 of a new mode (desig­nated by subscript 3) have been ascertained from engineering calculations and a sample survey. Then the probability that the rth person will use the new mode (assuming that the new mode is accessible to the person) can be esti­
mated by p_____________ ^ exp ^

° 1 + exp (x’iJ) + exp (x’afi) + exp (x’J)

where xi3 = Z/j – z„.

We should point out a restrictive property of the multinomial logit model: The assumption of independence of {ej) implies that the alternatives are dissimilar. Using McFadden’s famous example, suppose that the three alter­natives in Example 9.3.2 consist of car, red bus, and blue bus, instead of car, bus, and train. In such a case, the independence between e, and e2 is a clearly unreasonable assumption because a high (low) utility for red bus should generally imply a high (low) utility for a blue bus. The probability P0 = P(U0 > Ul, U0> U2) calculated under the independence assumption would underestimate the true probability in this case because the assumption ignores the fact that the event U0 > Ux makes the event U0> U2 more likely.

Alternatively, note that in the multinomial logit model the relative proba­bilities between a pair of alternatives are specified ignoring the third alterna­tive. For example, the relative probabilities between car and red bus are specified the same way regardless of whether the third alternative is blue bus or train. Mathematically, this fact is demonstrated by noting that (9.3.34) implies

Р(Уі =jyt =j or k) = [exp (x’tjfi) + exp (xk/f)]-1 exp (x’ijfi).

(9.3.45)

McFadden has called this characteristic of the model independence from irrelevant alternatives (IIA).

The following is another example of the model.

Example 9.3.3 (McFadden, 1976b). McFadden (1976b) used a multino­mial logit model to analyze the selection of highway routes by the California Division of Highways in the San Francisco and Los Angeles Districts during the years 1958-1966. The fth project among n = 65 projects chooses one from m, routes and the selection probability is hypothesized precisely as (9.3.34), where xy is interpreted as a vector of the attributes of route j in project i.

There is a subtle conceptual difference between this model and the model of Example 9.3.2. In the latter model, j signifies a certain common type of transport mode for all the individuals i. For example, у = 0 means car for all i. In the McFadden model, the /th route of the first project and the yth route of
the second project have nothing substantial in common except that both are number j routes. However, this difference is not essential because in this type of model each alternative is completely characterized by its characteristics vector x, and a common name such as car is just as meaningless as a number j in the operation of the model.

McFadden tested the IIA hypothesis by reestimating one of his models using the choice set that consists of the chosen route and one additional route randomly selected from mt. The idea is that if this hypothesis is true, estimates obtained from a full set of alternatives should be close to estimates obtained by randomly eliminating some nonchosen alternatives. For each coefficient the difference between the two estimates was found to be less than its standard deviation, a finding indicating that the hypothesis is likely to be accepted. However, to be exact, we must test the equality of all the coefficients simulta­neously.

Such a test, an application of Hausman’s test (see Section 4.5.1), is devel­oped with examples in the article by Hausman and McFadden (1984). They tested the IIA hypothesis in a trichotomous logit model for which the three alternatives are owning an electric dryer, a gas dryer, or no dryer. In one experiment, data on the households without a dryer were discarded to obtain a consistent but inefficient estimator, and in the other experiment, data on those owning electric dryers were discarded. In both experiments Hausman’s test rejected the IIA hypothesis at less than 1% significance level. Alternative tests of the IIA hypothesis will be discussed in Section 9.3.5.