# Results of Manski and Lerman

Manski and Lerman (1977) considered the choice-based sampling scheme represented by the likelihood function (9.5.3)—the case where Qis known— and proposed the estimator (WMLE), denoted pw, which maximizes

5л=І w(/)logP(/|x,.,A (9.5.5)

(-і

where w(j) = Q0(j)/H(j). More precisely, fiw is a solution of the normal equation

f=° ,9-5-6)

It will become apparent that the weights w(j) ensure the consistency of the estimator. If weights were not used, (9.5.5) would be reduced to the exogenous sampling likelihood function (9.5.1), and the resulting estimator to the usual MLE (the ESMLE), which can be shown to be inconsistent unless H(j) =

Gotf).

It should be noted that because WMLE does not depend on /(x), it can be used regardless of whether or not /is known.

We shall prove the consistency of the WMLE 0W in a somewhat different way from the authors’ proof.12 The basic theorems we shall use are Theorems

4.1.2 and 4.2.1. We need to make six assumptions.

Assumption 9.5.1. The parameter space В is an open subset of Euclidean space.

Assumption 9.5.2. H(j) > 0 for every j = 0,1,. . . , m.

Assumption 9.5.3. d log P(jx, 0)/d0 exists and is continuous in an open neighborhood Ni(0o) of Д, for every j and x. (Note that this assumption requires P(jx, 0) > 0 in the neighborhood.)13

Assumption 9.5.4. P(jx, 0) is a measurable function of j and x for every 0ЄВ.

Assumption 9.5.5. {j, x) are i. i.d. random variables.

Assumption 9.5.6. If0 Ф 0Q, P[P(jx, 0) Ф P(jx, /?„)] > 0.

To prove the consistency of0W, we first note that Assumptions 9.5.1,9.5.2, and 9.5.3 imply conditions A and В of Theorem 4.1.2 for Sn defined in (9.5.5). Next, we check the uniform convergence part of condition C by putting g(y, 0) = log P(jx, 0) – E log P(j|x, 0) in Theorem 4.2.1. (Throughout Section 9.5, E always denotes the expectation taken under the assumption of choice-based sampling; that is, for any g(j, x), Eg(j, x) = / ХГ_0 g(j, x)P(jx, 0o)Qo(j)~’H(j)f(x) dx.) Because 0 < P{jx, 0) < 1 for 0 Є Ч1, where ¥ is some compact subset of NX(0O), by Assumption 9.5.3 we clearly have E sup^eT|log P(j) — E log P(j)< This fact, together with Assumptions 9.5.4 and 9.5.5, implies that all the conditions of Theorem 4.2.1 are fulfilled. Thus, to verily the remaining part of condition C, it only remains to show that limn_„ nr lESn attains a strict local maximum at 0O.

Using Assumption 9.5.5, we have

~ E* log P(j),

where E in the left-hand side denotes the expectation taken according to the true choice-based sampling scheme, whereas E* after the last equality denotes the expectation taken according to the hypothetical random sampling scheme. (That is, for any g(j, x), E*g(j, x) = S^jLogU, x)P(jx, До)/(х) dx.) But, by Jensen’s inequality (4.2.6) and As­sumption 9.5.6, we have

E* log P(jx, P) < E* log P(jx, fi0) for (9.5.8)

Thus the consistency of WMLE has been proved.

That the ESMLE is inconsistent under the present model can be shown as follows. Replacing w(j) by 1 in the first equality of (9.5.7), we obtain

(9.5.9)

where Cj = Q0(j) 1H(j). Evaluating the derivative of (9.5.9) with respect to ft at fi0 yields

Шйс’*’Н/<х)4ь

It is clear that we must generally have Cj = 1 for every j in order for (9.5.10) to be 0.14

A

The asymptotic normality of fiw can be proved with suitable additional assumptions by using Theorem 4.1.3. We shall present merely an outline of the derivation and shall obtain the asymptotic covariance matrix. The neces­sary rigor can easily be supplied by the reader.

Differentiating (9.5.5) with respect to fi yields

Because (9.5.11) is a sum of i. i.d. random variables by Assumption 9.5.5, we

can expect n~1/2(dSn/dfl)tot0 converge to a normal random variable under suitable assumptions. We can show

_» дг(0, Л), (9.5.12)

‘fn d0 A

where

A = E[w(j)2n’]> (9-5-13)

where у = 6 log PU)ftPfio – Differentiating (9.5.11) with respect to 0′ yields

= A,

because

Therefore, from (9.5.12) and (9.5.15), we conclude M0w-fio)-*mAr’A A"1).

As we noted earlier, a researcher controls H(j) and therefore faces an interesting problem of finding an optimal design: What is the optimal choice of #0 )? We shall consider this problem here in the context of WMLE. Thus the question is, What choice of H(j) will minimize A-,A A-1?

First of all, we should point out that H(j) = Q0(j) is not necessarily the optimal choice. If it were, it would mean that random sampling is always preferred to choice-based sampling. The asymptotic covariance matrix of •fn(0w — 0o) when H(j) = QoU) is (Е*уу’)~ where E* denotes the expecta­tion taken with respect to the probability distribution P(jx, /?0)/(x). Writing
w(j) simply as w, the question is whether

(Ewyy’)~lEw2yy'(Ewyy’)~1 > (E*yyTl• (9.5.18)

The answer is no, even though we do have

CEwyy’)-lEw2yy'(.Ewrf’V > iPrfY1, (9.5.19)

which follows from the generalized Cauchy-Schwartz inequality Eyy’ > Eyx'(Exx’)~1 Exy’.

Let us consider an optimal choice of H(j) by a numerical analysis using a simple logit model with a single scalar parameter. A more elaborate numerical analysis by Cosslett (1981b), which addressed this as well as some other ques­tions, will be discussed in Section 9.5.5.

For this purpose it is useful to rewrite A and A as

A = E*w(j)yy’ (9.5.20)

and

A = —E*yy’. (9.5.21)

Denoting the asymptotic variance of Vn(^ — Д,) in the scalar case by VW(H), we define the relative efficiency of fiw based on the design H{j) by

because у is a scalar here.

We shall evaluate (9.5.22) and determine the optimal H(j) that maximizes Eff(#) in a binary logit model:

P0() = A(fi0x), where A, and x are scalars.

In this model the determination of the optimal value of h = H(l) is simple because we have

E*w(j)y2 = j + -—-f,, (9.5.23)

where

a = Exh{P0x)Exx1 exp 0V) [ 1 + exp (fioX)]~3 (9.5.24)

and

b = Ex[ 1 — A(2A0x)]£jcjc2 exp (2j30x) [ 1 + exp (A>*)]-3- (9.5.25)

Because a, b> 0, (9.5.23) approaches °° as h approaches either 0 or 1 and attains a unique minimum at

(9.5.26)

We assume x is binary with probability distribution: x = 1 with probability p — 0 with probability 1 — p.

Then, inserting 0O = log[(p + 2(2o – 1 )/(p ~ 2Q0 + 1)], where 0O = 0O(1), into the right-hand side of (9.5.22), Eff(#) becomes a function ofp, 0O, and h alone. In the last five columns of Table 9.7, the values ofEff(/f) are shown for various values of p, Q0, and h. For each combination of p and 0O, the value of the optimal h, denoted h*, is shown in the third column. For example, if p = 0.9 and 0o = 0.75, the optimal value of h is equal to 0.481. When h is set equal to this optimal value, the efficiency of WMLE is 1.387. The table shows that the efficiency gain of using choice-based sampling can be considerable and that h = 0.5 performs well for all the parameter values considered. It can be shown that if 0O = 0.5, then h* = 0.5 for all the values of p.

In the foregoing discussion of WMLE, we have assumed Q0(j) to be known. However, it is more realistic to assume that Q0(j) needs to be estimated from a separate sample and such an estimate is used to define w(j) in WMLE. Manski and Lerman did not discuss this problem except to note in a footnote

Table 9.7 Efficiency of WLME for various designs in a binary logit model

h

that the modified WMLE using an estimate Q0(j) is consistent as long as plim QoU) ~ QoU)- To verify this statement we need merely observe that

і/Г1 2 IQo(ji) ~ <2оШ]ШГ’ log Р(л|х„ /?)| (9.5.27)

ё max Q0(j) ~ <2o0′)l’ « 1 2 WUiY’ log F(y,|x,, У?)І-

і Pi

The right-hand side of (9.5.27) converges to 0 in probability uniformly in fiby Assumptions 9.5.2 and 9.5.3 and by the consistency of Q0(y).

To show the asymptotic equivalence of the modified WMLE and the origi­nal WMLE, we need an assumption about the rate of convergence of Q0(j)- By examining (9.5.12), we see that for asymptotic equivalence we need

pfim rr’/2 2 [<2оШ – <2о(л)]#(ЛГ1l[d log ЗД/ЭДА = 0.

і-1

(9.5.28)

Therefore we need

QoU) – QoU) = о(п-у’г). (9.5.29)

If Q0(j) is the proportion of people choosing alternative j in a separate sample of size,

QoU)-QoU) = 0(nj1/2). (9.5.30)

Therefore asymptotic equivalence requires that «/и, should converge to 0. See Hsieh, Manski, and McFadden (1983) for the asymptotic distribution of the WMLE with Q estimated in the case where n/nx converges to a nonzero constant.

An application of WMLE in a model explaining family participation in the AFDC program can be found in the article by Hosek (1980).