# Nonlinear Simultaneous Equations Models

In this chapter we shall develop the theory of statistical inference for nonlinear simultaneous equations models. The main results are taken from the author’s recent contributions (especially, Amemiya, 1974c, 1975a, 1976a, 1977a). Some additional results can be found in Amemiya (1983a). Section 8.1 deals with the estimation of the parameters of a single equation and Section 8.2 with that of simultaneous equations. Section 8.3 deals with tests of hypotheses, prediction, and computation.

7.2 Estimation in a Single Equation

8.1.1 Nonlinear Two-Stage Least Squares Estimator

In this section we shall consider the nonlinear regression equation

y,=f(Y„ Xlt, O(o) + u„ t = 1, 2,. . . , Г, (8.1.1)

where y, is a scalar endogeneous variable, Y, is a vector of endogenous vari­ables, X„ is a vector of exogenous variables, Oq is a X-vector of unknown parameters, and {и,} are scalar i. i.d. random variables with Ей, = 0 and Vu, = a2. This model does not specify the distribution ofY(. Equation (8.1.1) may be one of many structural equations that simultaneously define the distribution of y, and Yf, but here we are not concerned with the other equa­tions. Sometimes we shall write Xu, Oq) simply as/,(ao) or as ft. We define Г-vectors y, f, and u, the fth elements of which are y„f„ and u„ respectively, and matrices Y and X,, the tth rows of which are YJ and XJ„ respectively.1

The nonlinear least squares estimator of Oq in this model is generally incon­sistent for the same reason that the least squares estimator is inconsistent in a linear simultaneous equations model. We can see this by considering (4.3.8) and noting that plim A3 Ф 0 in general because f may be correlated with u, in the model (8.1.1) because of the possible dependence of Y, on yt. In this section we shall consider how we can generalize the two-stage least squares

(2SLS) method to the nonlinear model (8.1.1) so that we can obtain a consist­ent estimator.

Following the article by Amemiya (1974c), we define the class of nonlinear two-stage least squares (NL2S) estimators of Oq in the model (8.1.1) as the value of a that minimizes

Sj{ctW) = (y – f ),W(W/’W)-1W,(y – f), (8.1.2)

where W is some matrix of constants with rank at least equal to K.

In the literature prior to the article by Amemiya (1974c), a generalization of 2SLS was considered only in special cases of the fully nonlinear model (8.1.1), namely, (1) the case of nonlinearity only in parameters and (2) the case of nonlinearity only in variables. See, for example, the article by Zellner, Huang, and Chau (1965) for the first case and the article by Kelejian (1971) for the second case.2 The definition in the preceding paragraph contains as special cases the Zellner-Huang-Chau definition and the Kelejian definition, as well as Theil’s 2SLS. By defining the estimator as a solution of a minimization problem, it is possible to prove its consistency and asymptotic normality by the techniques discussed in Chapter 4.

First, we shall prove the consistency of NL2S using the general result for an extremum estimator given in Theorem 4.1.2. The proof is analogous to but slightly different from a proof given by Amemiya (1974c). The proof differs from the proof of the consistency of NLLS (Theorem 4.3.1) in that the deriva­tive of f is used more extensively here.

Theorem 8.1.1. Consider the nonlinear regression model (8.1.1) with the additional assumptions:

(A) lim Г-1 W’W exists and is nonsingular.

(B) df/da exists and is continuous in Що^), an open neighborhood of Oq.

(C) T~iW(dt/da’) converges in probability uniformly in а Є N(oto).

(D) plim Г"1 W'(df/dot’)ao is full rank.

Then a solution of the minimization of (8.1.2) is consistent in the sense of Theorem 4.1.2.

Proof. Inserting (8.1.1) into (8.1.2), we can rewrite Г-1 times (8.1.2) as Г"Юг – T^uTVu + T~f0 – f)TVf0 ~ f) (8.1.3)

+ 7’-12(f0-f)TVu

— +a2 + a3,

where Pw = W(W’W)-1W’, f = f(a), and f0 = f(Oo). Note that (8.1.3) is simi­lar to (4.3.8). First, Ax converges to 0 in probability because of assumption A and the assumptions on {и,}. Second, consider A2. Assumption В enables us to write

f=fo + G,(a-Oo), (8.1.4)

where G„ is the matrix the rth row of which is df /да* evaluated at af between a and «о – Therefore we have

A2 = T-‘icto ~ ayG;P*G.(ab – a). (8.1.5)

Therefore, because of assumptions C and D, A2 converges in probability uniformly in аЄЩао) to a function that is uniquely minimized at <v Finally, consider A3. We have by the Cauchy-Schwartz inequality

r-‘|(lo – f УР^иІ = T~l(aо – a)’G;P„4i| (8.1.6)

S [Г-Чоо – a)’G;P*G,(ao – a)]1/2

X [T~lu’Р^Ц]1/2.

Therefore the results obtained above regarding.^ and A2 imply that A3 con­verges to 0 in probability uniformly in а є Щщ). Thus we have verified all the conditions of Theorem 4.1.2.

Next, we shall prove the asymptotic normality of NL2S by verifying the conditions of Theorem 4.1.3.

Theorem 8.1.2. In addition to the assumptions of the nonlinear regression model (8.1.1) and those of Theorem 8.1.1, make the following assumptions:

(A) <Pf,/dadaf exists and is continuous in а є ЛГ(аь).

(B) T~1 W’idtf/dajda’) converges in probability to a finite matrix uni­formly in ot Є N(ao), where at is the ith element of the vector a.

Then, if we denote a consistent solution of the minimization of (8.1.2) by a, we have

VT(a — o<o)—► N(0, a2[plim T-‘G’0YM~l), where G0 is df/da’ evaluated at ocq .

Proof. First, consider condition C of Theorem 4.1.3. We have

=~fG°W’ ^(W’W)-‘W’u. (8.1.7)

But, using Theorems 3.5.4 and 3.5.5, ^(W’Wr’W’u^JVIO. a2

Iim TXW’W)-1] because of assumption A of Theorem 8.1.1 and the assump­tions on {и,}. Therefore, because of assumption D of Theorem 8.1.1 and because of Theorem 3.2.7, we have

Second, consider assumption В of Theorem 4.1.3. We have, for any a such that plim a = oto,

where { ) in the second term of the right-hand side is the matrix the tth row of which is given inside { }, 6 is df/да’ evaluated at a, and G+ is the matrix the rth row of which is df,/da’ evaluated at between at and Oq. But the term inside { } converges to 0 in probability because of assumptions A and C of Theorem 8.1.1 and assumption В of this theorem. Therefore assumptions A and C of Theorem 8.1.1 imply

Finally, because assumption A of this theorem implies assumption A of Theorem 4.1.3, we have verified all the conditions of Theorem 4.1.3. Hence, the conclusion of the theorem follows from (8.1.8) and (8.1.10).

Amemiya (1975a) considered, among other things, the optimal choice of

W. It is easy to show that plim ^GqP^o)-1 is minimized in the matrix sense (that is, A > В means A — В is a positive definite matrix) when we choose W = G = £G0. We call the resulting estimator the best nonlinear two-stage least squares (BNL2S) estimator. The asymptotic covariance matrix of – JT times the estimator is given by

VB = a1 plim r(G’G)-1. (8.1.11)

However, BNL2S is not a practical estimator because (1) it is often difficult to find an explicit expression for G, and (2) G generally depends on the unknown

parameter vector Og. The second problem is less serious because Oq may be replaced by any consistent member of the NL2S class using some W.

Given the first problem, the following procedure recommended by Amemiya (1976a) seems the best practical way to approximate BNL2S: Step 1. Compute a, a member of the NL2S class.

Step 2. Evaluate G = дЦда! at a—call it 6.

Step 3. Treat G as the dependent variables of regressions and search for the optimal set of independent variables, denoted W0, that best predict 6.

Step 4. Set W = W0.

If we wanted to be more elaborate, we could search for a different set of
independent variables for each column of б (say, W, for the ith column g, )

and set W = [Ри’.Іі, Pjv, i2> • • •

Kelejian (1974) proposed another way to approximate G. He proposed this method for the model that is nonlinear only in variables, but it could also work for certain fully nonlinear cases. Let the tth row of G0 be G’0„ that is, Go, = (df/da’)a0. Then, because G„, is a function of Y, and Oq, it is also a function of u, and cto; therefore write G,(u„ Oq ). Kelejian’s suggestion was to generate u, independently n times by simulation and approximate £G0, by n~l 2?_, Ga), where a is some consistent estimator ofa^. Kelejian also pointed out that G,(0, a) is also a possible approximation for EGt; although it is computationally simpler, it is likely to be a worse approximation than that given earlier.