# Univariate Binary Models

9.1.1 Model Specification A univariate binary QR model is defined by Р(Уі = 1) = F{x%), /=1,2,. . . , n,

where {y,} is a sequence of independent binary random variables taking the value 1 or 0, x, is a A-vector of known constants, fi0 is a A-vector of unknown parameters, and F is a certain known function.

It would be more general to specify the probability as F(x„ Д>), but the specification (9.2.1) is most common. As in the linear regression model, specifying the argument of Fas xJ/?0 is more general than it seems because the elements of x, can be transformations of the original independent variables. To the extent we can approximate a general nonlinear function of the original independent variables by x\$0, the choice of Fis not critical as long as it is a distribution function. An arbitrary distribution function can be attained by choosing an appropriate function H in the specification F[H(xt, 0O)]. .

The functional forms of F most frequently used in application are the following:

Linear Probability Model F(x) — x. Probit Model Logit Model

The linear probability model has an obvious defect in that F for this model is not a proper distribution function as it is not constrained to lie between 0 and 1. This defect can be corrected by defining F = 1 if F{x’fi0) > 1 and F—0 if F(x’tfi0) < 0, but the procedure produces unrealistic kinks at the truncation
points. Nevertheless, it has frequently been used in econometric applica­tions, especially during and before the 1960s, because of its computational simplicity.

The probit model, like many other statistical models using the normal distribution, may be justified by appealing to a central limit theorem. A major justification for the logit model, although there are other justifications (nota­bly, its connection with discriminant analysis, which we shall discuss in Sec­tion 9.2.8), is that the logistic distribution function Л is similar to a normal distribution function but has a much simpler form. The logistic distribution has zero mean and variance equal to я2/3. The standardized logistic distribu­tion e^/il + e2*) with Я = тг/ТЗ has slightly heavier tails than the standard normal distribution.

We shall consider two examples, one biometric and one econometric, of the model (9.2.1) to gain some insight into the problem of specifying the probabil­ity function.

Example 9.2.1. Suppose that a dosage xt (actually the logarithm of dosage is used as xt in most studies) of an insecticide is given to the j’th insect and we want to study how the probability of the ith insect dying is related to the dosage Xj. (In practice, individual insects are not identified, and a certain dosage x, is given to each of n, insects in group t. However, the present analysis is easier to understand if we proceed as if each insect could be identified). To formulate this model, it is useful to assume that each insect possesses its own tolerance against a particular insecticide and dies when the dosage level exceeds the tolerance. Suppose that the tolerance у f of the ith insect is an independent drawing from a distribution identical for all insects. Moreover, if the tolerance is a result of many independent and individually inconsequential additive factors, we can reasonably assume yf ~ N(fi, a2) because of the central limit theorem. Defining у, = 1 if the ith insect dies and y, = 0 otherwise, we have F(y, = 1) = P(y* < x,) = Ф[(х, – p)/o

giving rise to a probit model where /?, = —p/o and 02 — 1/cr. If, on the other hand, we assume that yf has a logistic distribution with mean p and variance a2, we get a logit model (9.2.3)

Example 9.2.2 (Domencich and McFadden, 1975). Let us consider the decision of a person regarding whether he or she drives a car or travels by
transit to work. We assume that the utility associated with each mode of transport is a function of the mode characteristics z (mainly the time and the cost incurred by the use of the mode) and the individual’s socioeconomic characteristics w, plus an additive error term e. We define Ua and С/ю as the ith person’s indirect utilities associated with driving a car and traveling by transit, respectively. Then, assuming a linear function, we have

Un = (Xo + г’юР + vt’,y0 + €ю (9.2.4)

and

Ua=al+x’lfi + yr’yl + €il. (9.2.5)

The basic assumption is that the ith person drives a car if Un > Un and travels by transit if Un < ию. (There is indecision if Un = Uю, but this happens with zero probability if €n and єю are continuous random variables.) Thus, defining у і = 1 if the ith person drives a car, we have

Р(у,= 1) = ДС/п>С/„) (9.2.6)

= Р[ею-€il<a1-a0 + (z„ – z, oYfi + wj(y, – y0)]

= Р[(а! – Oq) + (z, i – гюУА + wfty, – y0)],

where F is the distribution function of ей — en. Thus a probit (logit) model arises from assuming the normal (logistic) distribution for ею — e„.

9.1.2 Consistency and Asymptotic Normality of the Maximum Likelihood Estimator

To derive the asymptotic properties of the MLE, we make the following assumptions in the model (9.2.1).

Assumption 9.2.1. F has derivative /and second-order derivative /’, and 0 < F(x) < 1 and f{x) > 0 for every x.

Assumption 9.2.2. The parameter space В is an open bounded subset of the Euclidean AT-space.

Assumption 9.2.3. (x,) are uniformly bounded in і and lim„_. x, x’is a finite nonsingular matrix. Furthermore the empirical distribution of {x, } converges to a distribution function.

Both probit and logit models satisfy Assumption 9.2.1. The boundedness of (x,) is somewhat restrictive and could be removed. However, we assume this

to make the proofs of consistency and asymptotic normality simpler. Injthk way we hope that the reader will understand the essentials without beinj^ hindered by too many technical details. The logarithm of the likelihood function of the model (9.2.1) is given by   Therefore the MLE p is a solution (if it exists) of

where F, = F(x’,p) and/j =/(xj)J).

To prove the consistency of P, we must verify the assumptions of Theorem

4.1.2. Assumptions A and В are clearly satisfied. To verify C, we use Theorem

4.2.2. If we define gf(y, P) = [y~ F(x’jff0)] log F(x\$), gt(y, P) in a compact

neighborhood of P0 satisfies all the conditions for g,(y, 0) in the theorem because of Assumptions 9.2.1 and 9.2.3. Furthermore lim„_„ «“‘SjLiF*, log Ft exists because of Theorem 4.2.3. Therefore log F, converges

tolim„_e «“‘SjLjF*, log F„ whereF® = F(x^?0). in probability uniformly in P Є N(P0), an open neighborhood of Д,. A similar result can be obtained for the second term of the right-hand side of (9.2.7). Therefore Q(P) = plim n 1 log L

= lim n~l V F*, log F, + lim и 1 2) 0 ~ Fn) log (1 – Ft),

Ї-1 *"1

where the convergence is uniform in P&N(P0). Because our assumptions enable us to differentiate inside the limit operation in (9.2.9),1 we obtain

Щ = lim J] – lim n 1 2 T_Fpfx>’ (9-2-10)

op jri Ft /-і 1 rі  which vanishes at P = P0- Furthermore, because

which is negative definite by our assumptions, Q attains a strict local maxi­mum at P = p0. Thus assumption C of Theorem 4.1.2. holds, and hence the consistency of p has been proved.

A solution of (9.2.8) may not always exist (see Albert and Anderson, 1984).

For example, suppose that {x,} are scalars such that x, < 0 for і £ c for some integer c between 1 and n and x, > 0 for / > c and that yt = 0 for і Ш c and у, = 1 for і > c. Then log L does not attain a maximum for any finite value of

If {x,} are ^-vectors, the same situation arises if у = 0 for the values of x, lying within one side of a hyperplane in RK and у = 1 for the values of x, lying within the other side. However, the possibility of no solution of (9.2.8) does not affect the consistency proof because the probability of no solution ap­proaches 0 as и goes to » as shown in Theorem 4.1.2.

Next we shall prove asymptotic normality using Theorem 4.2.4, which means that assumptions A, B, and C of Theorem 4.1.3 and Eq. (4.2.22) need to be verified. Differentiating (9.2.8) with respect to fi yields Thus assumption A of Theorem 4.1.3 is clearly satisfied. We can verify that Assumptions 9.2.1 and 9.2.3 imply assumption В of Theorem 4.1.3 by using Theorems 4.1.5 and 4.2.2. Next, we have from (9.2.8)

(9.2.13)

The asymptotic normality of (9.2.13) readily follows from Theorem 3.3.5 (Liapounov CLT) because each term in the summation is independent with bounded second and third absolute moments because of Assumptions 9.2.1,

9.2.2, and 9.3.3. Thus assumption C of Theorem 4.1.3 is satisfied and we obtain

 1 dlogZ,

 (9.2.14)

 ЩО, A),

 fio

where (9.2.15)

Finally, we obtain from (9.2.12) lim — E

n—►*> П

verifying (4.2.22). Therefore

Mfi~ Po) ЩО, A”1)- (9.2.17)

For a proof of consistency and asymptotic normality under more general assumptions in the logit model, see the article by Gourieroux and Monfort

(1981) .