# Generalized Maximum Likelihood Estimator

Cosslett (1983) proposed maximizing the log likelihood function (9.2.7) of a binary QR model with respct to P and F, subject to the condition that F is a distribution function. The log likelihood function, denoted here as у/, is

¥(fi, Л = І {у, log F(x’fl) + (1 – y,) log [1 – F(x’M). (9.6.33)

f-i

The consistency proof of Kiefer and Wolfowite (1956) applies to this kind of model. Cosslett showed how to compute MLE fi and F and derived conditions for the consistency of MLE, translating the general conditions of Kiefer and Wolfowitz into this particular model. The conditions Cosslett found, which are not reproduced here, are quite reasonable and likely to hold in most practical applications.

Clearly some kind of normalization is needed on fi and /’before we maxi­mize (9.6.33). Cosslet adopted the following normalization: The constant term is 0 and the sum of squares of the remaining parameters is equal to 1. Note that the assumption of zero constant term is adopted in lieu of Manski’s assumption F(0) = 0.5. We assume that the constant term has already been eliminated from the x – jJthat appears in (9.6.33). Thus we can proceed, assum­ing P’P = 1.

The maximization of (9.6.33) is carried out in two stages. In the first stage we shall fix p and maximize y/(P, F) with respect to F. Let the solution be P(P). Then in the second stage we shall maximize yt[P, F(P)] with respect to/?. Although the second stage presents a more difficult computational problem, we shall describe only the first stage because it is nonstandard and concep­tually more difficult.

The first-stage maximization consists of several steps:

Step 1. Given ft, rank order {x’ft}. Suppose x'(l)ft < x[2)ft <. . . < x[n)ft, assuming there is no tie. Determine a sequence (y(1), y(2),. . . , y(n)) accord­ingly. Note that this is a sequence consisting only of ones and zeros.

Step 2. Partition this sequence into the smallest possible number of succes­sive groups in such a way that each group consists of a nonincreasing se­quence.

Step 3. Calculate the ratio of ones over the number of elements in each

group. Let a sequence of ratios thus obtained be (r,, r2……….. rK), assuming

there are К groups. If this is a nondecreasing sequence, we are done. We define F(x'(0ft) = rj if the (i)th observation is in the y’th group.

Step 4. If, however, r} < , for some j, combine the jth and O’ — 1 )th group

and repeat step 3 until we obtain a nondecreasing sequence.

The preceding procedure can best be taught by example:

Example 1.

 У(л 0 0 1 1 0 1 1 F(x'{i)ft) 0 1

In this example, there is no need for step 4. Example 2.

 У(о 0 0 1 1 0 1 0 1 1 F(K, ft) 0 і і 1

Here, the second and third group must be combined to yield

 Уі о 0 0 110 10 1 1 F(x'(i)P) 0 і 1

Note that F is not unique over some parts of the domain. For example, between x'(2)ft and x'{3)ft in Example 1, F may take any value between 0 and \$.

Asymptotic normality has not been proved for Cosslett’s MLE ft, nor for any model to which the consistency proof of Kiefer and Wolfowitz is applica­ble. This seems to be as difficult a problem as proving the asymptotic normal­ity of Manski’s maximum score estimator.

Cosslett’s MLE may be regarded as a generalization of Manski’s estimator because the latter searches only among one-jump step functions to maximize (9.6.33). However, this does not necessarily imply that Cosslett’s estimator is superior. Ranking of estimators can be done according to various criteria. If the purpose is prediction, Manski’s estimator is an attractive one because it maximizes the number of correct predictions.