# Nonlinear Least Squares Estimator

4.3.1 Definition

We shall first present the nonlinear regression model, which is a nonlinear generalization of Model 1 of Chapter 1. The assumptions we shall make are also similar to those of Model 1. As in Chapter 1 we first shall state only the fundamental assumptions and later shall add a few more assumptions as needed for obtaining particular results.

We assume

y,=Wo) + u„ t= 1,2…………… T, (4.3.1)

where y, is a scalar observable random variable, fi0 is a AT-vector of unknown parameters, and (u{) are i. i.d. unobservable random variables such that Ей, = 0 and Vu, = <r§ (another unknown parameter) for all t.

The assumptions on the function ft will be specified later. Often in practice we can write УХА))= /(*«> Pol where xf is a vector of exogenous variables (known constants), which, unlike the linear regression model, may not neces­sarily be of the same dimension as Д,.

As in Chapter 1, we sometimes write (4.3.1) in vector form as

у « f(Po) + u, (4.3.2)

where y, f, and u are all Г-vectors, for which the fth element is defined in (4.3.1).

Nonlinearity arises in many diverse ways in econometric applications. For example, it arises when the observed variables in a linear regression model are transformed to take account of serial correlation of the error terms (cf. Section 6.3). Another example is the distributed-lag model (see Section 5.6), in which the coefficients on the lagged exogenous variables are specified to decrease with lags in a certain nonlinear fashion. In both of these examples, nonlinear­ity exists only in parameters and not in variables.

More general nonlinear models, in which nonlinearity is present both in parameters and variables, are used in the estimation of production functions and demand functions. The Cobb-Douglas production function with an addi­tive error term is given by

Qt = PxKhLb + Ut, (4.3.3)

where Q, K, and L denote output, capital input, and labor input, respectively.6 The CES production function (see Arrow et al., 1961) may be written as

Q,=PA02K7fil + (1 ~ А2)ЬГаГа/а + Щ. (4.3.4)

See Mizon (1977) for several other nonlinear production functions. In the estimation of demand functions, a number of highly nonlinear functions have been proposed (some of these are also used for supply functions), for example, translog (Christensen, Joigenson, and Lau, 1975), generalized Lcontief (Die – wert, 1974), ^-branch (Brown and Heien, 1972), and quadratic (Howe, Pol­iak, and Wales, 1979).

As in the case of the maximum likelihood estimator, we can define the nonlinear least squares estimator (abbreviated as NLLS) of Д, in two ways, depending on whether we consider the global minimum or a local minimum. In the global case we define it as the value of fi that minimizes

over some parameter space B. In the local case we define it as a root of the normal equation

We shall consider only the latter case because (4.3.6) is needed to prove asymptotic normality, as we have seen in Section 4.1.2. Given the NLLS estimator p ofp0, we define the NLLS estimator of a, denoted as <r2, by

a2 = T-‘SAP). (4.3.7)

д

Note that p and a2 defined above are also the maximum likelihood estima­tors if the («,} are normally distributed.

4.3.2 Consistency7

We shall make additional assumptions in the nonlinear regression model so that the assumptions of Theorem 4.1.2 are satisfied.

Theorem 4.3.1. In the nonlinear regression model (4.3.1), make the addi­tional assumptions: There exists an open neighborhood N of po such that

(A) dfjdp exists and is continuous on N.

(B) flp) is continuous in P Є N uniformly in t; that is, given e > 0 there exists > 0 such that ІУХАї) ~/XA)l<€ whenever (А ~Р2У(Рі — P^<dfoT all Pi, p2 Є N and for all f.®

(C) Г^їЛіУХАШДг) converges uniformly in /?,, p2 Є N.

(D) lim Т-ЪГ-АШ) ~УХА)l2 * 0 if РФ Po.

Then a root of (4.3.6) is consistent in the sense of Theorem 4.1.2.

Proof. Inserting (4.3.1) into (4.3.5), we can rewrite T~x times (4.3.5) as

The term Ax converges to a in probability by Theorem 3.3.2 (Kolmogorov LLN 2). The term A2 converges to a function that has a local minimum at Д, uniformly in p because of assumptions C and D. We shall show that A3 converges to 0 in probability uniformly іпрЄ Nby an argument similar to the

proof of Theorem 4.2.1. First, plim^. Г_12£.і/ХАо)м( = 0 because of as­sumption C and Theorem 3.2.1. Next, consider supfiefr T~ 42£., flP)u,. Par­tition N into n nonoverlapping regions NX, N2,. . . , N„. Because of as­sumption B, for any e > 0 we can find a sufficiently large n such that for each i=l,2,… ,n

l/XAi) ~/t(/yi < 2^2 ^ ty/2 for Pi’02 Є Ni and for all t.

(4.3.9)

Therefore, using the Cauchy-Schwartz inequality, we have

where is an arbitrary fixed point in Nt. Therefore

+ nP 2 И? > О? + 1 j • (4.3.11)

Finally, we obtain the desired result by taking the limit of both sides of the inequality (4.3.11) as Tgoes to ». Thus we have shown that assumption C of Theorem 4.1.2 is satisfied. Assumptions A and В of Theorem 4.1.2 are clearly satisfied.

Assumption C of Theorem 4.3.1 is not easily verifiable in practice; therefore it is desirable to find a sufficient set of conditions that imply assumption C and are more easily verifiable. One such set is provided by Theorem 4.2.3. To apply the theorem to the present problem, we should assume УХА) =/(*i> A) and take x, and/X Ai)/X Аг) as the y, and g{y,, 0) of Theorem 4.2.3, respectively. Alternatively, one could assume that {x,} are i. i.d. random variables and use Theorem 4.2.1.

In the next example the conditions of Theorem 4.3.1 will be verified for a simple nonlinear regression model.

Example 4.3.1. Consider the nonlinear regression model (4.3.1) with f,(p0) — log (A, + xt), where fig and x, are scalars. Assume (i) the parameter space В is a bounded open interval (c, d), (ii) x, + P > S > 0 for every t and for every fi Є В, and (iii) {x(} are i. i.d. random variables such that £{[log (d + x()]2} < ». Prove that a root of (4.3.6) is consistent.

First, note that log (Д, + xt) is well defined because of assumptions (i) and

(ii) . Let us verify the conditions of Theorem 4.3.1. Condition A is clearly satisfied because of assumptions (i) and (ii). Condition В is satisfied because

I log (Pi + Xt) – log (p2 + x,)| ё IP* + x,~lPi ~ p21

by the mean value theorem, where Pf (depending on xt) is between px and p2, and because | Pf + xt~l is uniformly bounded on account of assumptions (i) and (ii). Condition C follows from assumption (iii) because of Theorem 4.2.1. To verify condition D, use the mean value theorem to obtain

і 2 Poe (fi + ” lo8 (A + ^)]2

■* i-i

= j]LW+x]r2(P-PoY,

where Pf (depending on xt) is between p and p0. But

IjjiPt + X’r^^fid + x,)-*

1 /-і 1 i-i

and

Plim У (d + Х’Г2 = E(d + x,)-2 > 0 r->« 1 ,_i

because of assumptions (i), (ii), and (iii) and Theorem 3.3.2 (Kolmogorov LLN 2). Therefore condition D holds.

When f(P0) has a very simple form, consistency can be proved more simply and with fewer assumptions by using Theorem 4.1.1 or Theorem 4.1.2 di­rectly rather than Theorem 4.3.1, as we shall show in the following example.

Example 4.3.2. Consider the nonlinear regression model (4.3.1) with fAPo) ~ (A + xt)2 and assume (i )a^p0£ b where a and b are real numbers such that a < b, (ii) limr_„ = q, and (iii) lim^» Г-12£.=

p > q2. Prove that the value ofp that minimizes ~(P + x,)2]2 in the domain [a, b] is a consistent estimator of P0.

We have

plim t 2 yt-(P+xf?

t—- l t-1

= пI + (0o~02f + ЧРо’РГр + 401 – Wo – Аг = Q(0),

where the convergence is clearly uniform in 0 є [a, b]. But, because Q(0) ma\$ + (0o-/Wo + 0 + 2qf,

Q(0) is uniquely minimized at0 = 0O. Therefore the estimator in question is consistent by Theorem 4.1.1.