Multinomial Generalizations
In all the models we have considered so far in Section 10.10, the sign of у ft determined two basic categories of observations, such as union members versus nonunion members, states with an antidiscrimination law versus those without, or collegegoers versus noncollegegoers. By a multinomial generalization of Type 5, we mean a model in which observations are classified into more than two categories. We shall devote most of this subsection to a discussion of the article by Duncan (1980).
Duncan presented a model of joint determination of the location of a firm and its input – output vectors. A firm chooses the location for which profits are maximized, and only the inputoutput vector for the chosen location is observed. Let Si(k) be the profit of the rth firm when it chooses location k, r=l,2,. . . , n and к = 1, 2,. . . , AT, and let y,(/c) be the inputoutput vector for the rth firm at the kth location. To simplify the analysis, we shall subsequently assume yt{k) is a scalar, for a generalization to the vector case is
straightforward. It is assumed that
si0c) = *"’P+uik (10.10.26)
and
yi(k) = x$’0+vik, (10.10.27)
where x^’ and are vector functions of the inputoutput prices and economic theory dictates that the same P appears in both equations.19 It is assumed that (un, ua,. . . , uiK, vn, va,… , vx) are i. i.d. drawings from a 2ATvariate normal distribution. Suppose s,(k,) > Sj(j) for any j Ф kt. Then a researcher observes y,(fc,) but does not observe yt(j) for j Ф kt.
For the following discussion, it is useful to define ^binary variables for each і by
Wj(k) = 1 if rth firm chooses kth location (10.10.28)
= 0 otherwise
and define the vector w,= 0,(1), w,(2),. . . ,w,(K)]’. Also define Ptk = P[wi(k) = 1] and the vector P, = (Pn, Pa,…, Pxy.
There are many ways to write the likelihood function of the model, but perhaps the most illuminating way is to write it as
L – ЦШкдШК) = 1]P*, (10.10.29)
і
where k, is the actual location the ith firm was observed to choose.
The estimation method proposed by Duncan can be outlined as follows:
Step 1. Estimate the /? that characterize /in (10.10.29) by nonlinear WLS.
Step 2. Estimate thePthat characterize Pin (10.10.29) by the multinomial probit MLE using the nonlinear WLS iteration.
Step 3. Choose the optimum linear combination of the two estimates of P obtained in steps 1 and 2.
To describe step 1 explicitly, we must evaluate pt = w,(/c,)= 1]
and a] = Vly^k^Wtiki) = 1] as functions ofPand the variances and covariances of the error terms of Eqs. (10.10.26) and (10.10.27). These conditional moments can be obtained as follows. Define z,(y) = s^k,) — St(j) and the (К— 1/vector z, = [z,(l),. . . , z,(k, — 1), z,(k( + 1),. . . , z,(AT)]’. To simplify the notation, write z, as z, omitting the subscript. Similarly, write Уі(кі) as y. Also, define R = E(y — Ey)(z — Ez)'[E(z — Ez)(z — 2?z)’]1 and Q = Vy — RE(z — Ez)(y — Ey). Then we obtain20
Pi = E(yz > 0) = Ey + RE(zz > 0) — R£z
and
a = V(yz > 0) = RF(zz > 0)R’ + Q. (10.10.31)
The conditional moments of z appearing in (10.10.30) and (10.10.31) can be found in the articles by Amemiya (1974b, p. 1002) and Duncan (1980, p. 850). Finally, we can describe the nonlinear WLS iteration of step 1 as follows: Estimate a? by inserting the initial estimates (for example, those obtained by minimizing [y,(fc,) — ju, ]2) of the parameters into the righthand side of (10.10.31)—call it a}. Minimize
(10.10.32)
і
with respect to the parameters that appear in the righthand side of (10.10.30). Use these estimates to evaluate the righthand side of (10.10.31) again to get another estimate of a]. Repeat the process to yield new estimates of fi.
Now consider step 2. Define
X, ш £(W, – P. Xw, – P,)’ = D, – P, P’, (10.10.33)
where D, is the KX К diagonal matrix the kth diagonal element of which is Pik. To perform the nonlinear WLS iteration, first, estimate Xt by inserting the initial estimates of the parameters into the righthand side of (10.10.33) (denote the estimate thus obtained as 2,); second, minimize
^(w,P<)’Xr(w(P1), (10.10.34)
where the minus sign in the superscript denotes a generalized inverse, with respect to the parameters that characterize P/; and repeat the process until the estimates converge. A generalized inverse A of A is any matrix that satisfies A AA = A (Rao, 1973, p. 24). A generalized inverse Xj is obtained from the matrix D"1 — Pjk 1Г, where 1 is a vector of ones, by replacing its kth column and row by a zero vector. It is not unique because we may choose any k.
Finally, regarding step 3, if we denote the two estimates of fi obtained by steps 1 and 2 by fix and fi2, respectively, and their respective asymptotic variancecovariance matrices by V, and V2, the optimal linear combination of the two estimates is given by (V71 + Vj1)1 V71/?; + (V71 + V71)~iV71)J2. This final estimator is asymptotically not fully efficient, however. To see this, suppose the regression coefficients of (10.10.26) and (10.10.27) differ: Call them fi2 and fix, say. Then, by a result of Amemiya (1976b), we know that fi2 is an asymptotically efficient estimator of fi2. However, as we have indicated in Section 10.4.4, fix is not asymptotically efficient. So a weighted average of the two could not be asymptotically efficient.
Dubin and McFadden (1984) used a model similar to that of Duncan in their study of the joint determination of the choice of electric appliances and the consumption of electricity. In their model, st(k) may be interpreted as the utility of the rth family when they use the fcth portfolio of appliances, and y,(k) as the consumption of electricity for the rth person holding the kth portfolio. The estimation method is essentially similar to Duncan’s method. The main difference is that Dubin and McFadden assumed that the error terms of (10.10.26) and (10.10.27) are distributed with Type I extremevalue distribution and hence that the P part of (10.10.29) is multinomial logit.
Exercises
1. (Section 10.4.3)
Verify (10.4.19).
2. (Section 10.4.3)
Verify (10.4.28).
3. (Section 10.4.3)
Consider Vyw and Vyw given in (10.4.32) and (10.4.33). As stated in the text, the difference of the two matrices is neither positive definite nor negative definite. Show that the first part of Vyw, namely, a2(Z’2~lZ)~l, is smaller than Vyw in the matrix sense.
4. (Section 10.4.5)
In the standard Tobit model (10.2.3), assume that <t2 = 1,0 is a scalar and the only unknown parameter, and {л:,} are i. i.d. binary random variables taking 1 with probability p and 0 with probability 1 — p. Derive the formulae ofp • AV[Jn(0 — 0)] for 0 = Probit MLE, Tobit MLE, Heckman’s LS, and NLLS. Evaluate them for 0 = 0, 1, and 2.
5. (Section 10.4.6)
Consider the following model:
У,= 1 if
= 0 if УЇ <0, /=1,2,. . . , и,
where (yf} are independent N(x’,0, 1). It is assumed that {y,} are observed but {yf } are not. Write a stepbystep instruction of the EM algorithm to obtain the MLE of/? and show that the MLE is an equilibrium solution of the iteration.
6. (Section 10.6)
Consider the following model:
У и = x»A + и»
3>2, = Xa& + W2.

where (и1(, u2i) are i. i.d. with the continuous density/(•, •). Denote the marginal density of uu by f (•) and that of u2i byf2{ •).
a. Assuming that yu, y2i, xu and x2i are observed for /=1, 2, . . . , n, express the likelihood function in terms off, f, and f2.
b. Assuming that yu, y2h x,„ and x2< are observed for all /, express the likelihood function in terms offf and^.
7. (Section 10.6)
Consider the following model:
У? = atz, + u, z? = ftyi + vi yt =1 if yf ё 0 = 0 if yf <0 z, = 1 if zf S 0 = 0 if zf <0,
where ut and v, are jointly normal with zero means and nonzero covariance. Assume that y*, z*, u, and v are unobservable and у and z are observable. Show that the model makes sense (that is, у and zare uniquely determined as functions of и and v) if and only if aft = 0.
8. (Section 10.6)
In the model of Exercise 7, assume that ft— 0 and that we have n i. i.d. observations on (y(, z,), /=1,2,. . . , n. Write the likelihood function of a. You may write the joint density of (u, v) as simply /(«, v) without explicitly writing the bivariate normal density.
9. (Section 10.6)
Suppose yf and zf, /=1,2…………. и, are i. i.d. and jointly normally dis
tributed with nonzero correlation. For each i, we observe (1) only у f, (2) only zf, or (3) neither, according to the following scheme:
(1) Observe у f = у і and do not observe zf if yf ё zf ё 0.
(2) Observe zf = z, and do not observe y( if zf > yf S 0.
(3) Do not observe either if yf < 0 or zf < 0.
Write down the likelihood function of the model. You may write the joint normal density simply as/( •, •).
10. Section (10.7.1)
Write the likelihood function of the following two models (cf. Cragg, 1971).
a. (yf, yf) ~ Bivariate N(xfi,, x2fi2, 1, a, an)
y2 = yf ^ yf>0 and yf>0
= 0 otherwise.
We observe only y2.
b. (yf, yf) ~ Bivariate N(xfii, x2fi2, 1, a, <i12) with yf truncated so thatyf>0
y2 = yf if yf > 0
= 0 if yf S 0
We observe only y2.
11. (Section 10.9.4)
In Tomes’ model defined by (10.9.7) through (10.9.9), consider the following estimation method: Step 1. Regress y2, on x,, and x2/ and obtain the least squares predictor j>2l. Step 2. Substitute y2f for y2, in (10.9.7) and apply the Tobit MLE to Eqs. (10.9.7) and (10.9.9). Will this method yield consistent estimates of and?
12. (Section 10.10)
Suppose the joint distribution of a binary variable w and a continuous variable у is determined by P(w = 1 y) = Л^у) and /(yw) = N(y2w, a2). Show that we must assume a2 y, = y2 for logical consistency.
13. (Section 10.10.1)
In model (10.10.1), Type 5 Tobit, define an observed variable у, by Уі = У*і if У*>0 = У*і if У и = 0
and assume that a researcher does not observe whether yf > 0 or S 0; that is, the sample separation is unknown. Write the likelihood function of this model.
14. (Section 10.10.4)
Let (yf» У*і> У зі) be a threedimensional vector of continuous random variables that are independent across i= 1,2,. . . ,n but may be correlated among themselves for each /. These random variables are unobserved; instead, we observe zt and y, defined as follows
2/ = y* 
if y* > 0 

= y* 
if yf Ш 0. 

y’=(° 
with probability A with probability 1 — A 
if yf > 0 
= 0 
if У и — 0. 
Write down the likelihood function. Use the following symbols: ЛіІУшУи) joint density of yf and yf
/зіІУшУзі) joint density of yf, and yf.
15. (Section 10.10.4)
Consider a regression model: yf = xfA 4 uu and yf = xfjSz + u2i, where the observable random variable yt is defined by y( = yf, with probability A and У/ = у */ with probability 1 — A. This is called a switching regression model. Write down the likelihood function of the model, assuming that (uu, u2i) are i. i.d. with joint density /(•,*)•
16. (Section 10.10.6)
Show Х, ХгХ, = X/ where X, is given in (10.10.33) and X~ is given after (10.10.34). Let wf and Pf be the vectors obtained by eliminating the fcth element from w, and P,, where к can be arbitrary, and let X? be the variancecovariance matrix of wf. Then show (w, — P,)’Xr(w, — P() = (w? — Pf )'(X?),(wf — p*).
Leave a reply