# Nonlinear Limited Information Maximum Likelihood Estimator

In the preceding section we assumed the model (8.1.1) without specifying the model for Y( or assuming the normality of u, and derived the asymptotic distribution of the class of NL2S estimators and the optimal member of the class—BNL2S. In this section we shall specify the model for Y, and shall assume that all the error terms are normally distributed; under these assump­tions we shall derive the nonlinear limited information maximum likelihood (NLLI) estimator, which is asymptotically more efficient than BNL2S. The NLLI estimator takes advantage of the added assumptions, and consequently its asymptotic properties depend crucially on the validity of the assumptions. Thus we are aiming at a higher efficiency at the possible sacrifice of robustness.

(8.1.22)

where V, is a vector of random variables, X, is a vector of known constants, and П is a matrix of unknown parameters. We assume that (u„ V,’) are indepen­dent drawings from a multivariate normal distribution with zero mean and variance-covariance matrix

(8.1.23)

We define X and V as matrices the ith rows of which are X/ and respec­

tively. Because u and V are jointly normal, we can write

и = У1иЧ2 + Є, (8.1.24)

where c is independent of V and distributed as N(0, о*2I), where a*2 = a2 — 0г2-гіаг-

The model defined in (8.1.24) may be regarded either as a simplified non­linear simultaneous equations model in which both the nonlinearity and the simultaneity appear only in the first equation or as the model that represents the “limited information” of the investigator. In the latter interpretation, X, are not necessarily the original exogenous variables of the system, some of which appear in the arguments of f, but, rather, are the variables a linear combination of which the investigator believes will explain Yt effectively.

Because the Jacobian of the transformation from (u, V)to (y, Y) is unity in our model, the log likelihood function assuming normality can be written, apart from a constant, as

L** = —у log |2| — ^ tr 2-1Q, (8.1.25)

where

u and V representing у — f and Y — ХП, respectively. Solving dL**/dX = 0 for 2 yields

2=r-‘Q. (8.1.26)

Substituting (8.1.26) into (8.1.25), we obtain a concentrated log likelihood function

L* = – y (log u’u + log IV’MUV|). (8.1.27)

Solving dL*/dIl = 0 for П, we obtain

n = (X’M„X)-|X’M, Y, (8.1.28)

where M„ = I — (u’u)-Iuu’. Substituting (8.1.28) into (8.1.27) yields a further concentrated log likelihood function

L = ~j (log u’u + log |Y’M„Y – Y’MBX(X’M„X)-1X’MI)Y|),

(8.1.29)

which depends only on a. Interpreting our model as that which represents the limited information of the researcher, we call the value of a that minimizes

(8.1.29) the NLLI estimator. The asymptotic covariance matrix of vT times the estimator is given by

VL = plim TG’MyG -{Ja-ji)g’m*g] ’> (8-1.30)

where Mv=l – VCV’Vr’V’ and М* = І – X(X’X)-‘X’.

The maximization of (8.1.29) may be done by the iterative procedures discussed in Section 4.4. Another iterative method may be defined as follows: Rewrite (8.1.27) equivalently as

L* = -|(logu’lVM + log|V’V|) (8.1.31)

and iterate back and forth between (8.1.28) and (8.1.31). That is, obtain П = (X’X)-1X’Y and V = Y — ХП, maximize (8.1.31) with respect to a after replacing V with V, call this estimator a and define u = у — f (a), insert it into (8.1.28) to obtain another estimator of П, and repeat the procedure until convergence.

The estimator a defined in the preceding paragraph is interesting in its own right. It is the value of a that minimizes

(y – f )'[I – M^Y(Y’M^Y)-1 Y’Mjf](у – f). (8.1.32)

Amemiya (1975a) called this estimator the modified nonlinear two-stage least squares (MNL2S) estimator. The asymptotic covariance matrix of •JT(a — a) is given by

VM = plim 7(G’M fCy1 [c*2G, MpG (8.1.33)

+ (<r2 – ct*2)G, Pa-G](G’M^G)-1.

Amemiya (1975a) proved VL < VM < VB. It is interesting to note that if f is linear in a and Y, MNL2S is reduced to the usual 2SLS (see Section 7.3.6).

In Sections 8.1.1 and 8.1.3, we discussed four estimators: (1) NL2S (as a class); (2) BNL2S; (3) MNL2S; (4) NLLI. If we denote NL2S(W = X) by SNL2S (the first S stands for standard), we have in the linear case

SNL2S ■ BNL2S ■ MNL2S ■ NLLI, (8.1.34)

where = means exact identity and ^ means asymptotic equivalence. In the nonlinear model defined by (8.1.1) and (8.1.22) with the normality assump­tion, we can establish the following ranking in terms of the asymptotic covar­iance matrix:

SNL2S C BNL2S C MNL2S C NLLI, (8.1.35)

where c means “is worse than.” However, it is important to remember that the first two estimators are consistent under more general assumptions than those under which the last two estimators are consistent, as we shall show in the following simple example.

Consider a very simple case of (8.1.1) and (8.1.22) given by

yt=*otzl + ut (8.1.36)

and

Zt = nxt + V„ (8.1.37)

where we assume the vector (u„ vt) is i. i.d. with zero mean and a finite nonsingular covariance matrix. Inserting (8.1.37) into (8.1.36) yields

у, = om2x2 + сиг2 + (u, + 2 oaiXfV, + avj — ao), (8.1.38)

where the composite error term (contained within the parentheses) has zero mean. In this model, SNL2S is 2SLS with z) regressed on x, in the first stage, and BNL2S is 2SLS with zj regressed on the constant term and x2, in the first stage. Clearly, both estimators are consistent under general conditions with­out further assumptions on ut and v,. On the other hand, it is not difficult to show that the consistency of MNL2S and NLLI requires the additional as­sumption

EvjEvju, = Ev3tEvtut, (8.1.39)

which is satisfied if u, and v, are jointly normal.