# Berkson’s Minimum Chi-Square Method

There are many variations of the minimum chi-square (MIN x1) method, one of which is Berkson’s method. For example, the feasible generalized least squares (FGLS) estimator defined in Section 6.2 is a MIN x2 estimator. An­other example is the Barankin-Gurland estimator mentioned in Section 4.2.4. A common feature of these estimators is that the minimand evaluated at the estimator is asymptotically distributed as chi-square, from which the name is derived.

The MIN x2 method in the context of the QR model was first proposed by Berkson (1944) for the logit model but can be used for any QR model. It is useful only when there are many observations on the dependent variable у having the same value of the vector of independent variables x (sometimes referred to as “many observations per cell”) so that F(x’P0) for the specific value of x can be accurately estimated by the relative frequency of у being equal to 1.

To explain the method in detail, we need to define new symbols. Suppose that the vector x, takes T distinct vector values x(1), x(2),. . . , x(r); and classify integers (1, 2,. . . , n) into T disjoint sets Ix, l2,. . . , 1T by the rule: і Є I, if x, = x(0. Define Pw = Р(Уі — 1) if і Є I,. In addition, define nt as the number of integers contained in /„ r, = yt, and Pit) = rjnt. Note that [P(t)} constitute the sufficient statistics of the model. In the following discus­sion we shall write xw, P(t), and P(t) as x„ P„ and P, if there is no ambiguity.

From (9.2.1) we have

P, = F(x%), t = 1, 2,. . . , T. (9.2.28)

IfF’is one-to-one (which is implied by Assumption 9.2.1), we can invert the relationship in (9.2.28) to obtain  where F~l denotes the inverse function of F. Expanding F Pt) in a Taylor series around P, (which is possible under Assumption 9.2.1), we obtain

and because w, is 0{njl) and hence can be ignored for large n, (as we shall show rigorously later), (9.2.30) approximately defines a heteroscedastic linear regression model. The MIN x2 estimator, denoted fi, is defined as the WLS estimator applied to (9.2.30) ignoring w,. We can estimate of by a] obtained by substituting P, for P, in (9.2.31). Thus P = ( 2 °ї2х<х'<) S °72x, F~ P,)-

r— 1 / (-1

We shall prove the consistency and asymptotic normality offi (as n goes to00 with T fixed) under Assumptions 9.2.1, 9.2.3, and the following additional assumption:

Assumption 9.2.4. limn_« (n,/n) = с, Ф0 for every 1=1,2,. . . , T, where Г is a fixed integer.

We assume statement 9.2.4 to simplify the analysis. However, if c, = 0 for some t, we can act as if the observations corresponding to that t did not exist and impose Assumptions 9.2.3 and 9.2.4 on the remaining observations.   Inserting (9.2.30) into (9.2.32) and rearranging terms yield  Because T is fixed, we have by Theorem 3.2.6

where a,2 = c, f2[F l(P,)]/[P,(l — P,)]. Also, using Theorem 3.2.7, we obtain 4= 2 (vi + wt) = 2 4x,

VW f-1 r-1 v«

X(<7t ^, + <7, ’w()

Г LP —-

-2 ff*
г-і   because plim n~U2afl = 07 і, plim 07‘<7, = 1, and plim <77‘w, = 0. But, be­cause {t>,} are independent, the vector (071 vt, <77lv2,. . . , Otxvt) converges to N(0, Ir). Therefore     Finally, we obtain from (9.2.33) through (9.2.36) and Theorem 3.2.7

Because A defined in (9.2.15) is equal to aj2x, x,’ under Assumption

9.2.4, (9.2.17) and (9.2.37) show that the MIN x2 estimator has the same asymptotic distribution as the MLE.2 The MIN %2 estimator is simpler to compute than the MLE because the former is explicitly defined as in (9.2.32), whereas the latter requires iteration. However, the MIN x2 method requires a laige number of observations in each cell. If the model contains several inde­pendent variables, this may be impossible to obtain. In the next subsection we shall compare the two estimators in greater detail.

In the probit model, F-1(F,) = Ф-1(Д). Although Ф"1 does not have an explicit form, it can be easily evaluated numerically. The function Ф-1( •) is called the probit transformation. In the logit model we can explicitly write

Л-ЧЛ) = log [Д/( 1 – P,)l (9-2.38)

which is called the logit transformation or the logarithm of the odds ratio.
Cox (1970, p. 33) mentioned the following modification of (9.2.38):

Л7ЧД) = log {[P, + (2n,)-‘]/[l-P, + (2л,)-‘]}- (9.2.39)

This modification has two advantages over (9.2.38): (1) The transformation

(9.2.39) can always be defined whereas (9.2.38) cannot be defined if P, = 0 or

1. (Nevertheless, it is not advisable to use Cox’s modification when n, is small.) (2) It can be shown that EA~Pt) — A~l(Pt) is of the order of nj2, whereas EA~l(Pt) — Л’ЧЛ) Is °f the order of nj1.

In the preceding passages we have proved the consistency and the asymp­totic normality of Berkson’s MIN x2 estimator assuming x, = x, for і Є /,. However, a situation may occur in practice where a researcher must proceed asifx, = x, fori Є I, even if x, Ф x, because individual observations x, are not available and only their group mean x, = л7%є/іх, is available. (Such an example will be given in Example 9.3.1.) In this case the MIN x2 estimator is generally inconsistent. McFadden and Reid (1975) addressed this issue and, under the assumption of normality ofxt, evaluated the asymptotic bias of the MIN x2 estimator and proposed a modification of the MIN x2 estimator that is consistent.