# Maximum Score Estimator—A Binary Case

Manski (1975) considered a multinomial QR model, but here we shall define his estimator for a binary QR model and shall prove its consistency. Our proof will be different from Manski’s proof.21 We shall then indicate how to extend the proof to the case of a multinomial QR model in the next subsection. Consider a binary QR model

Р(Уі = 1) = FfriPo), *’“1.2,. . • ,n, (9.6.1)

and define the score function

Sn(fi) = 2 Ш*І0 S 0) + (1 – У,)ХЫ0 < 0)], (9.6.2)

1—1

where

X(E) = 1 if event E occurs (9.6.3)

= 0 otherwise.

Note that the score is the number of correct predictions we would make if we predicted уj to be 1 whenever Э 0. Manski’s maximum score estimator Д,

is defined by

S„(P„) = sup (9.6.4)

ДЄВ

where the parameter space В is taken as B = (W=1).

Clearly, (9.6.5) implies no loss of generality because S„(c0) = Sn(fi) for any positive scalar c.

Because is not continuous in fi, we cannot use Theorem 4.1.1 without

a modification. However, an appropriate modification is possible by general­izing the concept of convergence in probability as follows:

Definition 9.6.1. Let (ft, A, P) be a probability space. A sequence of not necessarily measurable functions gjico) for rw Є ft is said to converge to 0 in probability in the generalized sense if for any e > 0 there exists Ат Є A such that

AtQ{o)І |£г(й>)|<є} and limr-^„ P(AT) = 1.

Using this definition, we can modify Theorem 4.1.1 as follows:

Theorem 9.6.1. Make the following assumptions:

(A) The parameter space 0 is a compact subset of the Euclidean ЛГ-space

(*’).

(B) Qr(y, в) is a measurable function of у for all 0 Є 0.

(C) T-‘QM converges to a nonstochastic continuous function Q(6) in probability uniformly in в є 0 as Гgoes to <*, and Q(0) attains a unique global maximurn at в0.

Define вт as a value that satisfies

йтівт) = sup QM – (9.6.6)

Then §T converges to в0 in probability in the generalized sense.

We shall now prove the consistency of the maximum score estimator with the convergence understood to be in the sense of Definition 9.6.1.

Theorem 9.6.2. Assume the following:

(A) F is a distribution function such that F(x) = 0.5 if and only if x = 0.

(B) (xf) are i. i.d. vector random variables with a joint density function g(x) such that g(x) > 0 for all x.

Then any sequence (Д,) satisfying (9.6.4) converges to Д, in probability in the generalized sense.

Proof Let us verify the assumptions of Theorem 9.6.1. Assumptions A and В are clearly satisfied. The verification of assumption C will be done in five steps.

First, define for Я > 0

SM = 2 ІУЛГа(*Ї0) + 0 ~ (9-6.7)

f-i

where

^д(х) = 0 if (9.6.8)

= Ax if 0 < x < A’1 = 1 if Г’ёх   Because each term of the summation in (9.6.7) minus its expected value satisfies all the conditions of Theorem 4.2.1 for a fixed positive A, we can conclude that for any є, S > 0 there exists n^A), which may depend on A, such that for all n ё «/A)

 14 (9.6.9) where QxiP) = EF(x’А0)у/л(х’Р) + E[ 1 – F{x’ Р0)]ц/х(—х’P). (9.6.10) Second, we have sup Іл-‘ЗД) – n-‘SM 1 (9.6.11)

where Ых) = 0 if A-‘SM

= 1+Ax if — A~‘ <дг<0 = 1 ~ Ax if 0 ё x < A"1.

Applying Theorem 4.2.1 to A,, we conclude that for any e, S > 0 there exists n2(A), which may depend on A, such that for all пШ л2(А) (9.6.13)

We have A2 =§ sup P[(x’P)2 < Г2]. P

But, because the right-hand side of (9.6.14) converges to 0 as A —* <» because of assumption B, we have for all A § Aj

(9.6.15)  Therefore, from (9.6.11), (9.6.13), and (9.6.15), we conclude that for all n ^ n2(A) and for all A S Aj

Therefore, using the same argument that led to (9.6.15), we conclude that for all А ё Ai P [sup IGO?) – Ся(Д)І > f] = °-

Fourth, because sup n~lS„(P) – QW £ supn~ls„(fi) – n-lSMI P P

+ sapn~lSM-Q).(P)

P

+sup іа(л – q(P) i.

fi

we conclude from (9.6.9), (9.6.16), and (9.6.19) that for any e, S > 0 we have for all и ё max[/j,(A,), n2(A,)]

Fifth and finally, it remains to show that Q(fi) defined in (9.6.17) attains a unique global maximum at Д,. This is equivalent to showing

f [1-2F(x’fi0)]g(x)d* (9.6.2)

Jx’fia<0

> f [1 – 2F(x%)]g(x) dxi[0¥=fio.

But, because 1 — 2F(x’fi0) >0 in the region {х|х’Д, < 0} and 1 — 2F(x’fi0) < 0 in the region (х|х’Д> > 0} by assumption A, (9.6.22) follows immediately from assumption B.