# Maximum Score Estimator—A Multinomial Case

The multinomial QR model considered by Manski has the following struc­ture. The utility of the і th person when he or she chooses the jth alternative is given by

Uij^XvPo + bj’ /= 1, 2,. . . , я, 7 = 0, 1,… , m,

(9.6.23)

where we assume

Assumption 9.6.1. {e^} are i. i.d. for both і and j.

Assumption 9.6.2. (xj,, x<,,. . . , x-m)’ = x, is a sequence of {m + 1 ^-di­mensional i. i.d. random vectors, distributed independently of {e,-,}, with a joint density #(x) such that g(x) > 0 for all x.

Assumption 9.6.3. The parameter space В is defined by В = {filfi’fi = 1).

Each person chooses the alternative for which the utility is maximized. Therefore, if we represent the event of the ith person choosing the 7 th alterna­tive by a binary random variable yv, we have

ytj =1 if Uff > £/л for all кФ j (9.6.24)

= 0 otherwise.

We need not worry about the possibility of a tie because of Assumption 9.6.2.

The maximum score estimator Д, is defined as the value of ft that maximizes the score22

S„(fi) = 2 X for 311 #7) (9-6-25)

/-1 j-0

subject to P’P — 1, where x is defined in (9.6.3). We shall indicate how to generalize the consistency theorem to the present multinomial model.

As in the proof of Theorem 9.6.2, we must verify the assumptions of Theorem 9.6.1. Again, assumptions A and В are clearly satisfied. Assumption C can be verified in a manner very similar to the proof of Theorem 9.6.2. We merely note whatever changes are needed for the multinomial model.

We can generalize (9.6.7) as

л m

sjfi) = 2 2 wt(x« ~ x«)’A <x0 “ хп УР> • • • > <9-6-26)

1-І J-0

(x„ — Xj j-iYf}, (Xy — Xfj+iYp, . . • , (xff — x(m) /?],

where

Wi(Zi, z2,. . . , zw) = 0 if min (z,) S 0 (9.6.27)

= 1 if min (Z/) > A-1

= A min fa) otherwise.

Then the first four steps of the proof of Theorem 9.6.2 generalize straightfor­wardly to the present case.

The fifth step is similar but involves a somewhat different approach. From (9.6.25) we obtain

Q(P) m plim n~lS„(p) (9.6.28)

n—

= £ 2 P(yj = 1 |x, Po)x(x’jP = *’kP for аИ кФJ)

j-o

= Eh(x, P),

where {yj} and {x^} are random variables from which i. i.d. observations {y0} and (xy) are drawn and x = (x&, x’,,. . . , x’mY – First, we want to show A(x, P) is uniquely maximized at \$>. For this purpose consider maximizing

h*(x, {Aj)) = j? P(yj = 1 lx, P0)x(Aj) (9.6.29)

for every x with respect to a nonoverlapping partition {Aj} of the space of x.

This is equivalent to the following question: Given a particular x„, to which region should we assign this point? Suppose

P{yja = 1 |xo, ft,) > P(yj = 1 |Xo, A,) for all (9.6.30)

Then it is clearly best to assign Xo to the region defined by

P(yJo = 1 |x, fto) > P(yj = 1 |x, Д>) for all j Ф]!0. (9.6.31)

Thus (9.6.29) is maximized by the partition {Aj) defined by

Aj = {х|х;д, ё x’Jo for k*j). (9.6.32)

Clearly, this is also a solution to the more restricted problem of maximizing h{, fi). This maximum is unique because we always have a strict inequality in

(9.6.30) because of our assumptions. Also our assumptions are such that if A(x, fi) is uniquely maximized for every x at fi0, then Eh{, fi) is uniquely maximized at fi0. This completes the proof of the consistency of the maximum score estimator in the multonomial case. (Figure 9.2 illustrates the maximiza­tion of Eq. 9.6.29 for the case where m = 3 and x is a scalar.)

The asymptotic distribution of the maximum score estimator has not yet been obtained. A major difficulty lies in the fact that the score function is not differentiable and hence Theorem 4.1.3, which is based on a Taylor expansion of the derivative of a maximand, cannot be used. The degree of difficulty for the maximum score estimator seems greater than that for the LAD estimator discussed in Section 4.6—the method of proving asymptotic normality for LAD does not work in the present case. In the binary case, maximizing (9.6.2)

P(yo=l|*.0o) p(y, = l|x,0o) Р(Уг sl|x,0o)

A°z At Ae0

Figure 9.2 An optimal partition of the space of an independent variable is equivalent to minimizing SjLjly, — S 0)|. This shows both a similar­ity of the maximum score estimator to the LAD estimator and the additional difficulty brought about by the discontinuity of the x function.

Manski (1975) reported results of a Monte Carlo study in which he com­pared the maximum score estimator with the logit MLE in a five-response multinomial logit model with four unknown parameters and 400 observa­tions. The study demonstrated a fairly good performance by the maximum score estimator as well as its computational feasibility. The bias of the estima­tor is very small, whereas the root mean squared error is somewhere in the magnitude of double that of MLE. A comparison of the maximum score estimator and MLE under a model different from the one from which MLE is derived would be interesting.