# Polychotomous choice sample selection models

A polychotomous choice model with m-alternatives specifies the latent utility values Uj = гу + v, j = 1,…, m, and the alternative j is chosen if and only if Uj > max{Uk : k = 1,…, m; k Ф j}. A specified joint distribution for v = (v1,…, vm) implies the choice probability Gj for the jth alternative, where Gj(xу) = Рр;у – zку > vk – Vj, k Ф j, k = 1,…, m | x). Familiar parametric polychotomous choice models are the conditional logit model and the nested logit model of McFadden (1973, 1978) and the multinomial probit model. For a recent survey on qualitative response models, see Chapter 17 by Maddala and Flores-Lagunes in this volume. For a sample selection model, say, y = x в + u for the alternative 1, a desirable specification will allow u to correlate with the disturbances in the utility equa­tions. A traditional approach may specify a joint distribution for v = (v1,…, vm) and u. In such an approach, the marginal distributions for v and u are deter­mined by the joint distribution. If they are jointly normally distributed, the implied choice model will be the multinomial probit model. However, it is less obvious how to incorporate selected outcome equations with other familiar choice models. Dubin and McFadden (1984) and Lee (1983) suggest alternative specification approaches. Dubin and McFadden (1984) suggest a linear condi­tional expectation specification in that E(u | v, x) is a linear function of v. The distribution of v can be the one which generates the logit or nested logit choice component. They suggest two-stage estimation methods based on bias-corrected outcome equations. Lee (1983) suggests an approach based on order statistics and distributional transformations. The marginal distributions of v and u are first specified, and the model is then completed with a distribution with specified margins. From the choice equations, define a random variable e1 as e1 = max{y* : k = 2,…, m} – v1. The first alternative is chosen if and only if z1y > e1. In terms of e 1, the choice inequality looks like a binary choice criterion for alternative 1. Given the distributions for v (and hence G1), the implied dis­tribution of e 1 is F1(c | x) = G1(c – z2y2…, c – zmym). When u1 is normally distributed, a normal-distribution transformation is suggested to transform e1 to a standard normal variable e* as e* = Ф-1(Р1(є11 x)). The u and e* are then assumed to be jointly normally distributed with zero means and a covariance ащ E*. Under such a specification, the bias-corrected outcome equation is similar to the familiar one as E(y11 x, I1 = 1) = x1p1 – <5ЩЕ* ф(Ф^уГ))). If u were not normally distributed, other

transformations rather than the normal distribution might be desirable. If the marginal distribution of u1 were unknown, flexible functional specifications such as the bivariate Edgeworth expansion might be used. Under this approach, both a simple two-stage method and the method of maximum likelihood can be used. Schmertmann (1994) compares the advantages and disadvantages of the McFadden and Dubin, and Lee approaches. His conclusion is that the McFadden and Dubin specification can likely be affected by multicollinearity as several bias-correction terms may be introduced; and Lee’s specification imposes restrict­ive covariance structures on u and v and may be sensitive to misspecification. On the latter, Lee (1995) derives some implications on comparative advantage measures.