# The Optimal Two-step or Iterated GMM Estimator

It is remarked in Section 3 that if q = p then GMM is equivalent to the MM estimator based on E[f(vt, 0o)] = o and so the estimator does not depend on the weighting matrix. However if q > p then it is clear from Theorem 2 that the asymptotic variance of 0T depends on WT via W.19 This opens up the possibility

that inferences may be sensitive to W. It is desirable to base inference on the most precise estimator and so the optimal choice of W is the one which yields the minimum variance in a matrix sense. This choice is given in the following theorem which was first proved by Hansen (1982).

Theorem 4. Optimal weighting matrix. If Assumptions 1-10 and cer­tain other regularity conditions hold then the minimum asymptotic variance of 0T is (G0S-1Go)-1 and this can be obtained by setting W = S-1.

Theorem 4 implies the optimal choice of WT is S-1 where ST is a consistent esti­mator of S. This appears to create a circularity because inspection of (11.21)-(11.22) reveals that ST depends on 0T in general. However, this problem is easily re­solved by using a two-step estimation. On the first step a sub-optimal choice of WT is used to obtain a preliminary estimator, 0T(1). This estimator is used to obtain a consistent estimator of S, which is denoted ST(1). On the second step 0o is re-estimated with WT = ST(1)-1. The resulting estimator, 0T(2), has the minimum asymptotic covariance matrix given in Theorem 4. However, this two-step esti­mator is based on a version of the optimal weighting matrix constructed using a sub-optimal estimator of 00. This suggests there may be finite sample gains from using 0T(2) to construct a new estimator of S, ST(2) say, and then re-estimating 0o with WT = ST(2)-1. The resulting estimator, 0T(3), also has the same asymptotic distribution as 0T(2) but it is anticipated to be more efficient in finite samples. This potential finite sample gain in efficiency provides a justification for updat­ing the estimate of S again and re-estimating 00. This process can be continued iteratively until the estimates converge; if this is done then it yields what has become known as the iterated GMM estimator.

The choice of W = S-1 has a second important implication for the asymptotic behavior of the estimator which is presented in the following theorem.20

Theorem 5. Asymptotic independence of the estimator and esti­mated sample moment. If (i) Assumptions 1-10 and certain other regularity conditions hold; (ii) W = S-1; then T1/2(0T – 0o) and S~1/2T1/2gT(0T) are asymptotic­ally independent.

Since both T1/2(0T – 0o) and S~1/2T1/2gT(0T) are asymptotically normally distributed, Theorem 5 is established by showing that these two statistics are asymptotically uncorrelated. The latter can be deduced from (11.18) and (11.25). Using Assump­tion 1o and putting W = S_1, it follows from (11.18) and (11.25) that

T1/2(0t – 0o) = H1,t + 0p(1), (11.26)

W1/2T1/2gT(0T) = HXt + 0p(1), (11.27)

where HXt = -[F(0o)’F(0o)]-1F(0o)’S-1/2T1/2gT(0o) and H2J = [!„ – P(0o)]S1/2T1/2gT (0o). If we let C = limT^„ cov[H1T, H2T] then it follows from Theorems 2 and 3 that

Using (11.25) and (11.26) in (11.28), we obtain

C = lim E[-[F(00)’F(00)]-1F(00)’S-1/2T1/2gT(00)T1/2?T(00)’S-1/2′[l, – P(00)]]

= -[F(00)’F(00)]-1F(00)’ S-1/2 {lim var[T1/2 gT (Є0)]} S-1/2′[Iq – P (00)]

= -[F(00)’F(00)]-1F(00)’S-1/2SS-1/2′[1, – P (00)]

= 0

because S = S1/2’S1/2 implies S~1/2SS~1/2 = Iq, and F(00)'[Iq – P(00)] = 0.

In contrast, if W Ф S-1 then the same sequence of arguments yields the conclu­sion that C Ф 0. Therefore, Theorem 5 provides an interesting perspective on why this choice of W leads to an efficient estimator: W = S-1 is the only choice of weighting matrix for which the estimator is statistically independent of the part of the moment condition unused in estimation. In other words, by making this choice of W, we have extracted all possible information about the parameters contained in the sample moment.

The estimators described in this section are often described as "the optimal two-step GMM" or "optimal iterated GMM" estimator. It is important to realize that this optimality only refers to the choice of weighting matrix. These are the most precise GMM estimators which can be constructed from the given popula­tion moment condition E [f(vt, 00)] = 0. It does not imply that there is anything optimal about the population moment condition itself. The optimal choice of moment condition is discussed in Section 8.