# Prediction

Bianchi and Calzolari (1980) proposed a method by which we can calculate the mean squared prediction error matrix of a vector predictor based on any estimator of the nonlinear simultaneous equations model. Suppose the struc­tural equations can be written as f(yp, xp, a) = up at the prediction period p and we can solve for yp as yp = g(xp, a, up). Define the predictor \$p based on the estimator a by % = g(xp, a, 0). (Note that yp is an jV-vector.) We call this
the deterministic predictor. Then we have E(Ур ~ fP)(YP ~ fpY

= E[g(xp, a, up)~ g(xp, a, 0)] [g(xp, a, up)~ g(xp, a, 0)]’ + o, 0) – g(xp, a, 0)] [g(xp, a, 0) – g{xp, a, 0)]’

= Aj + A2.

Bianchi and Calzolari suggested that A, be evaluated by simulation. As for A2, we can easily obtain its asymptotic value from the knowledge of the asymp­totic distribution of a.

Mariano and Brown (1983) and Brown and Mariano (1982) compared the deterministic predictor fp defined in the preceding paragraph with two other predictors called the Monte Carlo predictor yp and the residual-based predic­tor yp defined as (8.3.4)

where {vj are i. i.d. with the same distribution as up, and (8.3.5)

where u, = f(y„ x„ a).

Because yp — yp = (y„ — Eyp) + (Eyp — yp) for any predictor yp, we should compare predictors on the basis of how well Eyp = Eg(xp, a, u„) is estimated by each predictor. Moreover, because a is common for the three predictors, we can essentially consider the situation where a in the predictor is replaced by the parameter a. Thus the authors’ problem is essentially equivalent to that of comparing the following three estimators of Eg(up):

Deterministic: g(0)

1 5

Monte Carlo: – У g(vj

t-i 1 T

Residual-based: — ^ g(uf)

Clearly, the deterministic predictor is the worst, as Mariano and Brown (1983) concluded. According to their other article (Brown and Mariano, 1982), the choice between the Monte Carlo and residual-based predictors depends on the
consideration that the former can be more efficient if 5 is large and the assumed distribution of up is true, whereas the latter is simpler to compute and more robust in the sense that the distribution of up need not be specified.

8.3.2 Computation

The discussion of the computation of NLFI preceded the theoretical discus­sion of the statistical properties of NLFI by more than ten years. The first article on computation was by Eisenpress and Greenstadt (1966), who pro­posed a modified Newton-Raphson iteration. Their modification combined both (4.4.4) and (4.4.5). Chow (1973) differed from Eisenpress and Greenstadt in that he obtained simpler formulae by assuming that different parameters appear in different equations, as in (8.2.1). We have already mentioned the algorithm considered by Amemiya (1977a), mainly for a pedagogical purpose. Dagenais (1978) modified this algorithm to speed up the convergence and compared it with a Newton-Raphson method proposed by Chow and Fair (1973) and with the DFP algorithm mentioned in Section 4.4.1 in certain examples of nonlinear models. The results are inconclusive. Belsley (1979) compared the computational speed of the DFP algorithm in computing NLFI and NL3S in five models of various degrees of complexity and found that NL3S was three to ten times faster. Nevertheless, Belsley showed that the computation of NLFI is quite feasible and can be improved by using a more suitable algorithm and by using the approximation of the Jacobian proposed by Fair—see Eq. (8.3.6).

Fair and Parke (1980) estimated Fair’s (1976) macro model (97 equations, 29 of which are stochastic, with 182 parameters including 12 first-order auto­regressive coefficients), which is nonlinear in variables as well as in parameters (this latter nonlinearity is caused by the transformation to take account of the first-order autoregression of the errors), by OLS, SNL2S, the Joigenson-Laf- font NL3S, and NLFI. The latter two estimators are calculated by a deriva­tive-free algorithm proposed by Parke (1982).

Parke noted that the largest model for which NLFI and NL3S had been calculated before Parke’s study was the one Belsley calculated, a model that contained 19 equations and 61 unknown coefficients. Parke also noted that Newton’s method is best for linear models and that the DFP method is pre­ferred for small nonlinear models; however, Parke’s method is the only feasi­ble one for large nonlinear models.

Fair and Parke used the approximation of the Jacobian

£ log IJ,| = (log |J,,| + log IJ/2| + . . . + log |JJ), (8.3.6)

t-i n

where Jr = df,/dy,’, и is a small integer, and f,, t2,. . . , t„ are equally spaced between 1 and T.

The hypothesis (8.2.14) can be tested by Hausman’s test (Section 4.5.1) using either NLFI versus NL3S or NLFI versus NL2S. By this test, Fair found little difference among the estimators. Fair also found that in terms of predic­tive accuracy there is not much difference among the different estimators, but, in terms of policy response, OLS is set apart from the rest.

Hatanaka (1978) considered a simultaneous equations model nonlinear only in variables. Such a model can be written as F( Y, Х)Г + XB = U. Define Y by F(Y, X)f + XB = 0, where f and б are the OLS estimates. Then Hatanaka proposed using F(Y, X) as the instruments to calculate 3SLS. He proposed the method-of-scoring iteration to calculate NLFI, where the iteration is started at the aforementioned 3SLS.

Exercises

1. (Section 8.1.2)

Define what you consider to be the best estimators of a and fi in the model (y, + a)2 = fix, + u,,t= 1, 2,. . . , T, where {x,) are known constants and {«,} are i. i.d. with Eu, — 0 and Vu, = a2. Justify your choice of esti­mators.

2. (Section 8.1.2)

In model (8.1.12) show that the minimization of 2£., [z,(A) — x’J}]2 with respect to A and P yields inconsistent estimates.

3. (Section 8.1.3)

Prove the consistency of the estimator of a obtained by minimizing (У ~ f)'[! — V(V’V)_1V’]-1(y — f) and derive its asymptotic variance – covariance matrix. Show that the matrix is smaller (in the matrix sense) than VM given in (8.1.33).

4. (Section 8.1.3)

In the model defined by (8.1.36) and (8.1.37), consider the following two – stage estimation method: In the first stage, regress z, on x:, and define f, = nxt, where n is the least squares estimator; in the second stage, regress y, on (£, )2 to obtain the least squares estimator of a. Show that the resulting estimator of a is inconsistent. (This method may be regarded as an appli­cation of Theil’s interpretation—Section 7.3.6—to a nonlinear model.)

5. (Section 8.1.3)

In the model defined by (8.1.36) and (8.1.37), show that the consistency of MNL2S and NLLI requires (8.1.39).

6. (Section 8.1.3)

In the model defined by (8.1.36) and (8.1.37), assume n = 1, x, = 1 for all t, Vu, = Vv, — 1, and Cov(ut, v,) = c. Evaluate the asymptotic variances, denoted F,, F2, F3, and V4, of the SNL2S, BNL2S, MNL2S, and NLLI estimators of a and show Vx Ш V2 S F3 Ш V4 for every |c| < 1.

7. (Section 8.2.3)

Consider the following two-equation model (Goldfeld and Quandt, 1968): log yu = 7i log y2, + Pi+ P2x, + «1/»

У* = У2Уи + P&.+ Ub,

where у, > 0 and y2 < 0. Show that there are two solutions of yu and y2t for a given value of (ult, u2l). Show also that {uu, u2t) cannot be normally distributed.

8. (Section 8.2.3)

Consider the following model (Phillips, 1982): logyu + Q!,xr=uu,

Уъ + (x2yu = и*-

Derive the conditions on uu and that make NLFI consistent. Use

(8.2.14) .