# Nested Logit Model

In Section 9.3.3 we defined the multinomial logit model and pointed out its weakness when some of the alternatives are similar. In this section we shall discuss the nested (or nonindependent) logit model that alleviates that weak­ness to a certain extent. This model is attributed to McFadden (1977) and is developed in greater detail in a later article by McFadden (1981). We shall analyze a trichotomous model in detail and then generalize the results ob­tained for this trichotomous model to a general multinomial case.

Let us consider the red bus-blue bus model once more for the purpose of illustration. Let Uj = fij + €j, j — 0, 1, and 2, be the utilities associated with car, red bus, and blue bus. (To avoid unnecessary complication in notation, we have suppressed the subscript i.) We pointed out earlier that it is unreason­able to assume independence between €, and e2, although €q may be assumed independent of the other two. McFadden suggested the following bivariate distribution as a convenient way to take account of a correlation between e, and e2:

Де,, e2) = exp {- [exp (~p~ *€,) + exp (-/Г ‘€2)р}, (9.3.50)

0 <pS 1.

Johnson and Kotz (1972, p. 256) called this distribution Gumbel’s Type В bivariate extreme-value distribution. The correlation coefficient can be shown to be 1 — p2. Ifp = 1 (the case of independence), F(e,, e2) becomes the prod­uct of two Type I extreme-value distributions—in other words, the multino­
mial logit model. As for €q, we assume F(e0) = exp [—exp (—€„)] as in the multinomial logit model.   Under these assumptions we can show

The other probabilities can be deduced from (9.3.51) and (9.3.52). Therefore these two equations define a nested logit model in the trichotomous case. By dividing the numerator and the denominator of (9.3.51) by exp (p0) and those of(9.3.52)byexp (—p~lfii), we note that the probabilities depend onp2 — Pq, Pi — p0, and p. We would normally specify Pj = Xjfi, j = 0, 1,2. The estima­tion of P and p will be discussed for more general nested logit models later.

The form of these two probabilities is intuitively attractive. Equation

(9.3.52) shows that the choice between the two similar alternatives is made according to a binary logit model, whereas (9.3.51) suggests that the choice between car and noncar is also like a logit model except that a certain kind of a weighted average of exp (pt) and exp (p2) is used.

To obtain (9.3.51), note that P(y = 0) = />(f/0> U1,U0> U2)

= P(Po + €q > Pi + £»Po ■*" eo > Pi + €г)

Г» Г Г«o-t-мо—л*і Г f «о+л>-^

-JLU – LJL ep<-*)

Xexp [ — exp (—бо)ІЛ€і> ег) de2 І de, j deo

= J” exp (-e0) exp [-exp (-€o)]

X exp (— {exp [ —p~l(e0 + Po ~ Pi)]

+ exp [— p~l(<Eo +Po ~P2)])P) de0

= J exp (—Co) exp [-a exp (—€0)] *0 = a_1,

where a = 1 + exp (~Po) [exp (p~%) + exP (P 1Pi)Y•

To obtain (9.3.52), we first observe that

P(y =ЦуФ0) = P{Ul > U2Ul > UQ or U2>U0) (9.3.54)

= P(Uy>U2),

where the last equality follows from our particular distributional assumptions. Next we obtain

P(t/i > U2) = Р(Рг + є, > fi2 + e2) (9.3.55)

f“ Г Ґ *l+Al-/*2 "I

= J у f{el,€2)de2^del

= J exp (-€,){1 + exp l-p-‘iMi – fi2)]y~l

X exp (—exp (—€,)

X (1 + exp [~p-p, – fi2)])p) fife,

= {1+ exp l-p~l(Mi – Аг)])-1, where in the third equality we used

дДЄі’Єг) = jexp (_p-iCl) + eXp (-p-i€2 )]/>-* (9.3.56)

de,

Xexp(-/>-1€,)F(e1,e2).

The model defined by (9.3.51) and (9.3.52) is reduced to the multinomial logit model if p = 1. Therefore the IIA hypothesis, which is equivalent to the hypothesis p — 1, can be tested against the alternative hypothesis of the nested logit model by any one of the three asymptotic tests described in Section 4.5.1. Hausman and McFadden (1984) performed the three tests using the data on the households’ holdings of dryers, which we discussed in Section 9.3.3. The utilities of owning the two types of dryers were assumed to be correlated with each other, and the utility of owning no dryer was assumed to be independent of the other utilities. As did Hausman’s test, all three tests rejected the IIA hypothesis at less than 1% significance level. They also conducted a Monte Carlo analysis of Hausman’s test and the three asymptotic tests in a hypotheti­cal trichotomous nested logit model. Their findings were as follows: (1) Even with n = 1000, the observed frequency of rejecting the null hypothesis using the Wald and Rao tests differed greatly from the nominal size, whereas it was better for the LRT. (2) The power of the Wald test after correcting for the size was best, with Hausman’s test a close second.

Next, we shall generalize the trichotomous nested logit model defined by (9.3.51) and (9.3.52) to the general case of m + 1 responses. Suppose that the m + 1 integers 0, 1,. . . ,m can be naturally partitioned into S groups so that each group consists of similar alternatives. Write the partition as

(0, 1, 2,. . . , m) = Bi U B2 U. . . U Bs, (9.3.57)

where U denotes the union. Then McFadden suggested the joint distribution

F(eo, e,,. . . , em) = exp j – 2 a, exp (-pj’ej) J* j.

(9.3.58)

Then it can be shown that

a A 2 «Ф (A 701

^ Pj =——– ______________ J_ ,

J£B’ 2 a 2) єхр (p7%)]p’

t-i U/єа, J

*=1,2,. . . ,5,

and  exp (pj’Mj)

2 exp (pjipky

кєв,

Note that (9.3.59) and (9.3.60) are generalizations of (9.3.51) and (9.3.52), respectively. Clearly, these probabilities define the model completely. As be­fore, we can interpret

1 exp (рт%) єв,

as a kind of weighted average of exp (fij) for j Є Bz.

The nested logit model defined by (9.3.59) and (9.3.60) can be estimated by MLE, but it also can be consistently estimated by a natural two-step method, which is computationally simpler. Suppose we specify^- = x’fi. First, the part of the likelihood function that is the product of the conditional probabilities of the form (9.3.60) is maximized to yield an estimate of pjlfl. Second, this estimate is inserted into the right-hand side of (9.3.59), and the product of (9.3.59) over 5 and і (which is suppressed) is maximized to yield estimates of p’s and cCs (one of the a’s can be arbitrarily set). The asymptotic covariance matrix of these elements is given in McFadden (1981).

In a large-scale nested logit model, the two-step method is especially useful because of the near infeasibility of MLE. Another merit of the two-step method is that it yields consistent estimates that can be used to start a Newton-Raphson iteration to compute MLE, as done by Hausman and McFadden.

We shall present two applications of the nested logit model. Example 9.3.4 (Small and Brownstone, 1982). Small and Brownstone ap­plied a nested logit model to analyze trip timing. The dependent variable takes 12 values corresponding to different arrival times, and the authors experi­mented with various ways of “nesting” the 12 responses, for example, Bx =

B3 = (10, 11, 12). Alla’s were assumed to be equal to 1, and various specifica­tions of the p’s were tried. Small and Brownstone found that the two-step estimator had much larger variances than the MLE and often yielded unrea­sonable values. Also, the computation of the asymptotic covariance matrix of the two-step estimator took as much time as the second-round estimator obtained in the Newton-Raphson iteration, even though the fully iterated Newton-Raphson iteration took six times as much time.  Example 9.3.5 (McFadden, 1978). A person chooses a community to live in and a type of dwelling to live in. There are S communities; integers in Bs signify the types of dwellings available in community s. Set as = a constant, as = a constant, and pcd = P’xcd + о! гс. Then

(9.3.62)

As in this example, a nested logit is useful when a set of utilities can be naturally classified into independent classes while nonzero correlation is al­lowed among utilities within each class.