# Single Equation Estimation: Two-Stage Least Squares

In matrix form, we can write the first structural equation as

yi = Yiai + Xifi1 + ui = ZiSi + ui (11.34)

where y1 and u1 are (T x 1), Y1 denotes the right hand side endogenous variables which is (T x g1) and X1 is the set of right hand side included exogenous variables which is (T x k{), a1 is of dimension g1 and в1 is of dimension k1. Z1 = [Y1,X1] and 81 = (a),^). We require the existence of excluded exogenous variables, from (11.34), call them X2, enough to identify this equation. These excluded exogenous variables appear in the other equations in the simultaneous model. Let the set of all exogenous variables be X = [X1,X2] where X is of dimension (T x k). For the order condition to be satisfied for equation (11.34) we must have (k — k1) > g1. If all the exogenous variables in the system are included in the first step regression, i. e., Y1 is regressed on X to get Y1, the resulting second stage least squares estimator obtained from regressing y1 on Y1 and X1 is called two-stage least squares (2SLS). This method was proposed independently by Basmann (1957) and Theil (1953). In matrix form Y1 = PxY1 is the predictor of the right hand side endogenous variables, where PX is the projection matrix X(X’X) lX’. Replacing Yi by Yi in (11.34), we get

yi = Yi<Ti + XiP1 + wi = Zi6i + wi (11.35)

where Zi = [Y/i, Xi] and wi = ui + (Yi — Y/i)ai. Running OLS on (11.35) one gets

Yi,2SLS = (Z Zi)-iZi yi = (Zi Px Zi)-iZi Px yi (11.36)

where the second equality follows from the fact that Zi = PXZi and the fact that PX is idempotent. The former equality holds because PXX = X, hence PXXi = Xi, and PXYi = Yi. If there is only one right hand side endogenous variable, running the first-stage regression y2 on Xi and X2 and testing that the coefficients of X2 are all zero against the hypothesis that at least one of these coefficients is different from zero is a test for rank identification. In case of several right hand side endogenous variables, things get complicated, see Cragg and Donald (1996), but one can still run the first-stage regressions for each right hand side endogenous variable to make sure that at least one element of X2 is significantly different from zero. i This is not sufficient for the rank condition but it is a good diagnostic for whether the rank condition fails. If we fail to meet this requirement we should question our 2SLS estimator.

Two-stage least squares can also be thought of as a simple instrumental variables estimator with the set of instruments W = Zi = [Yi, Xi]. Recall that Yi is correlated with ui, rendering OLS inconsistent. The idea of simple instrumental variables is to find a set of instruments, say W for Zi with the following properties: (1) plim W’ui/T = 0, the instruments have to be exogenous, i. e., uncorrelated with the error term, otherwise this defeats the purpose of the instruments and result in inconsistent estimates. (2) plim W’W/T = Qw = 0, where Qw is finite and positive definite, the W’s should not be perfectly multicollinear. (3) W should be highly correlated with Zi, i. e., the instruments should be highly relevant, not weak instruments as we will explain shortly. In fact, plim W’Zi/T should be finite and of full rank (ki + gi). Premultiplying (11.34) by W’, we get

W ‘yi = W’Zi Si + W’ ui (11.37)

In this case, W = Zi is of the same dimension as Zi, and since plim W’Zi/T is square and of full rank (ki + ^q), the simple instrumental variable (IV) estimator of Si becomes

Zi;/y = (W ‘Zi)-iW’yi = Si + (W ‘Zi)-iW’ ui (11.38)

with plim Si, iv = Si which follows from (11.37) and the fact that plim W’ui/T = 0.

Digression: In the general linear model, y = X@ + u, X is the set of instruments for X. Premultiplying by X’ we get X’y = X’X@ + X’u and using the fact that plim X’u/T = 0, one

gets

Piv = (X’X) lX’y = Pols.

This estimator is consistent as long as X and u are uncorrelated. In the simultaneous equation model for the first structural equation given in (11.34), the right hand side regressors Zi include endogenous variables Yi that are correlated with ui. Therefore OLS on (11.34) will lead to
inconsistent estimates, since the matrix of instruments W = Zf, and Zf is correlated with uf. In fact,

Yi, ols = (Zf Zi)-1Z1 yi = Si + (Zf Zi)-1Z1 ui

with plim Si, OLS = Sf since plim Ziui/T = 0.     Denote by ei, OLS = yi — ZiSi, OLS as the OLS residuals on the first structural equation, then

since the last term is positive. Only if plim Zfuf/T is zero will plim sf = aii, otherwise it is smaller. OLS fits very well, it minimizes (yf — Z1S1)'(y1 — ZfSf). Since Zf and uf are correlated, OLS attributes part of the variation in yf that is due to uf incorrectly to the regressor Zf.

Both the simple IV and OLS estimators can be interpreted as method of moments estimators. These were discussed in Chapter 2. For OLS, the population moment conditions are given by E(X’u) = 0 and the corresponding sample moment conditions yield X'(y — Xfi)/T = 0. Solving for в results in eOLS. Similarly, the population moment conditions for the simple IV estimator in (11.37) are E(W’uf) = 0 and the corresponding sample moment conditions yield W'(yi — Zi81)/T = 0. Solving for results in 61}1у given in (11.38).   If W = [Y^X^, then (11.38) results in

which is the same as (11.36)  (11.40)

provided YfYi = Yi’Yi, and XfYf = XfYf. The latter conditions hold because Yf = PXYf, and

Px Xi = Xi.

In general, let X* be our set of first stage regressors. An IV estimator with Y|* = PX* Yf, i. e., with every right hand side y regressed on the same set of regressors X*, will satisfy

Y*’Y* = Y*’PX* Yi = Y*’Y, i

In addition, for XfY* to equal XfYf, Xf has to be a subset of the regressors in X*. Therefore X* should include Xf and at least as many X’s from X2 as is required for identification, i. e., (at least gi of the X’s from X2). In this case, the IV estimator using W* = [Y/i*,X1] will result in the same estimator as that obtained by a two stage regression where in the first step У* is obtained by regressing Yf on X*, and in the second step yf is regressed on W*. Note that these are the same conditions required for consistency of an IV estimator. Note also, that if this equation is just-identified, then there is exactly gf of the X’s excluded from that equation. In other words, X2 is of dimension (T x gi), and X* = X is of dimension T x (gf + ki). Problem 3 shows that 2SLS in this case reduces to an IV estimator with W = X, i. e.

Note that if the first equation is over-identified, then X’Zi is not square and (11.41) cannot be computed.

Rather than having W, the matrix of instruments, be of exactly the same dimension as Zi which is required for the expression in (11.38), one can define a generalized instrumental variable in terms of a general matrix W of dimension T x £ where £ > gi + ki. The latter condition is the order condition for identification. In this case, 61gv is obtained as GLS on (11.37). Using the fact that

plim W’uiu’1W/T = aii plim W’W/T,

one gets

8i, iv = (Zi Pw Zi)-iZi Pw yi = 8i + (Zi Pw Zi)-iZi PW ui

with plim 6igV = 8i and limiting covariance matrix aii plim (ZiPwZi/T)-i. Therefore, 2SLS can be obtained as a generalized instrumental variable estimator with W = X. This also means that 2SLS of 8i can be obtained as GLS on (11.34) after premultiplication by X’, see problem

4. Note that GLS on (11.37) minimizes (yi — Zi8i)’Pw(yi — Zi8i) which yields the first-order conditions

Zi Pw (yi — Zi8i, iv) = 0

the solution of which is 8-_gV = (ZiPwZi)-iZiPwyi. It can also be shown that 2SLS and the generalized instrumental variables estimators are special cases of a Generalized Method of Moments (GMM) estimator considered by Hansen (1982). See Davidson and MacKinnon (1993) and Hall (1993) for an introduction to GMM.

For the matrix ZiPwZi to be of full rank and invertible, a necessary condition is that W must be of full rank £ > (gi + ki). This is in fact, the order condition of identification. If £ = gi + ki, then this equation is just-identified. Also, W’Zi is square and nonsingular. Problem 10 asks the reader to verify that the generalized instrumental variable estimator reduces to the simple instrumental variable estimator given in (11.38). Also, under just-identification the minimized value of the criterion function is zero.

One of the biggest problems with IV estimation is the choice of the instrumental variables W. We have listed some necessary conditions for this set of instruments to yield consistent estimators of the structural coefficients. However, different choices by different researchers may yield different estimates in finite samples. Using more instruments will yield more efficient IV estimation. Let Wi and W2 be two sets of IV’s with Wi being spanned by the space of W2. In this case, Pw2 Wi = Wi and therefore, Pw2 Pwi = Pwi. For the corresponding IV estimators

8i, wi = (Zi Pwi Zi)-iZi Pwi yi for i = 1,2

are both consistent for 8i as long as plim W’ui/T = 0 and have asymptotic covariance matrices aii plim (Zi Pwi Zi/T )-i

Note that 8i, w2 is at least as efficient as 8i, wi, if the difference in their asymptotic covariance matrices is positive semi-definite, i. e., if

 Z1′ Pwi Z1 -1 Zi Pw2 Zi a11 plim i w —a11 plim i w

is p. s.d. This holds, if Z[PW2 Zi — Z[PWl Zi is p. s.d. This last condition holds since PW2 — Pw1 is idempotent. Problem 11 asks the reader to verify this result. 6,w2 is more efficient than 6i, Wl since W2 explains Zi at least as well as Wi. This seems to suggest that one should use as many instruments as possible. If T is large this is a good strategy. But, if T is finite, there will be a trade-off between this gain in asymptotic efficiency and the introduction of more finite sample bias in our IV estimator.

In fact, the more instruments we use, the more will Yi resemble Yi and the more bias is introduced in this second stage regression. The extreme case where Yi is perfectly predicted by Yi returns us to OLS which we know is biased. On the other hand, if our set of instru­ments have little ability in predicting Yi, then the resulting instrumental variable estimator will be inefficient and its asymptotic distribution will not resemble its finite sample distribu­tion, see Nelson and Startz (1990). If the number of instruments is fixed and the coefficients of the instruments in the first stage regression go to zero at the rate 1//T, indicating weak correlation, Staiger and Stock (1997) find that even as T increases, IV estimation is not consis­tent and has a nonstandard asymptotic distribution. Bound et al. (1995) recommend reporting the R2 or the R-statistic of the first stage regression as a useful indicator of the quality of IV estimates.

Instrumental variables are important for obtaining consistent estimates when endogeneity is suspected. However, invalid instruments can produce meaningless results. How do we know whether our instruments are valid? Stock and Watson (2003) draw an analogy between a relevant instrument and a large sample. The more relevant the instrument, i. e., the more the variation in the right hand side endogenous variable that is explained by this instrument, the more accurate the resulting estimator. This is similar to the observation that the larger the sample size, the more accurate the estimator. They argue that the instruments should not just be relevant, but highly relevant if the normal distribution is to provide a good approximation to the sampling distribution of 2SLS. Weak instruments explain little of the variation in the right hand side endogenous variable they are instrumenting. This renders the normal distribution as a poor approximation to the sampling distribution of 2SLS, even if the sample size is large. Stock and Watson (2003) suggest a simple rule of thumb to check for weak instruments. If there is one right hand side endogenous variable, the first-stage regression can test for the significance of the excluded exogenous variables (or instruments) using an R-statistic. This first-stage R-statistic should be larger than 10.2 Stock and Watson (2003) suggest that a first-stage R-statistic less than 10 indicates weak instruments which casts doubt on the validity of 2SLS, since with weak instruments, 2SLS will be biased even in large samples and the corresponding f-statistics and confidence intervals will be unreliable. Finding weak instruments, one can search for additional stronger instruments, or use alternative estimators than 2SLS which are less sensitive to weak instruments like LIML. Deaton (1997, p. 112) argues that it is difficult to find instruments that are exogenous while at the same time highly correlated with the endogenous variables they are instrumenting. He argues that it is easy to generate 2SLS estimates that are different from OLS but much harder to make the case that these 2SLS estimates are necessarily better than OLS. “Credible identification and estimation of structural equations almost always requires real creativity, and creativity cannot be reduced to a formula.” Stock and Watson (2003, p. 371) show that for the case of a single right hand side endogenous variable with no included exogenous variables and one weak instrument, the distribution of the 2SLS estimators is non­normal even for large samples, with the mean of the sampling distribution of the 2SLS estimator approximately equal to the true coefficient plus the asymptotic bias of the OLS estimator divided by (E(F) — 1) where F is the first-stage F-statistic. If E(F) = 10, then the large sample bias of 2SLS is (1/9) that of the large sample bias of OLS. They argue that this rule of thumb is an acceptable cutoff for most empirical applications.

2SLS is a single equation estimator. The focus is on a particular equation. [y, Yl, Xl] is spec­ified and therefore all that is needed to perform 2SLS is the matrix X of all exogenous variables in the system. If a researcher is interested in a particular behavioral economic relationship which may be a part of a big model consisting of several equations, one need not specify the whole model to perform 2SLS on that equation, all that is needed is the matrix of all exogenous vari­ables in that system. Empirical studies involving one structural equation, specify which right hand side variables are endogenous and proceed by estimating this equation via an IV procedure that usually includes all the feasible exogenous variables available to the researcher. If this set of exogenous variables does not include all the X’s in the system, this estimation method is not 2SLS. However, it is a consistent IV method which we will call feasible 2SLS.

Substituting (11.34) in (11.36), we get

Si,2SLS = 6i + (Z Px Zi)-1Z1 Px ui (11.42)

with plim 6l}2SLS = 61 and an asymptotic variance covariance matrix given by a11 plim (Z’!PxZ1/T)-1. a11 is estimated from the 2SLS residuals ul = y1 — Z1 61}2SLS, by comput­ing s11 = U’lul/(T — g1 — k1). It is important to emphasize that s11 is obtained from the 2SLS residuals of the original equation (11.34), not (11.35). In other words, s11 is not the mean squared error (i. e., s2) of the second stage regression given in (11.35). The latter regression has Y1 in it and not Y1. Therefore, the asymptotic variance covariance matrix of 2SLS can be esti­mated by s11(ZlPxZ1)-1 = s11(ZlZ1)-1. The t-statistics reported by 2SLS packages are based on the standard errors obtained from the square root of the diagonal elements of this matrix. These standard errors and t-statistics can be made robust for heteroskedasticity by computing (ZlZi)-l(Zldiag[u2]Zl)(ZlZ1)-1 where щ denotes the i-th 2SLS residual. Wald type statistics for Ho; R6l = r based on 2SLS estimates of 6l can be obtained as in equation (7.41) with Zi,2SLS replacing ‘Pols and var(6it2SLS) = sn(ZlZi)-1 replacing var0oLs) = sii(X’X)-1. This can be made robust for heteroskedasticity by using the robust variance covariance matrix of 6]_,2SLS described above. The resulting Wald statistic is asymptotically distributed as x2 under the null hypothesis, with q being the number of restrictions imposed by R6l = r.

LM type tests for exclusion restrictions, like a subset of 6l set equal to zero can be performed by running the restricted 2SLS residuals on the matrix of unrestricted second stage regressors Zl. The test statistic is given by TR2a where R2a denotes the uncentered R2. This is asymptotically distributed as x% under the null hypothesis, where q is the number of coefficients in 6l set equal to zero. Note that it does not matter whether the exclusion restrictions are imposed on f3l or al, i. e., whether the excluded variables to be tested are endogenous or exogenous. An F-test for these exclusion restrictions can be constructed based on the restricted and unrestricted residual sums of squares from the second stage regression. The denominator of this F-statistic, however, is based on the unrestricted 2SLS residual sum of squares as reported by the 2SLS package. Of course, one has to adjust the numerator and denominator by the appropriate degrees of freedom. Under the null, this is asymptotically distributed as F(q, T — (gl + kl)). See Wooldridge (1990) for details. Also, see the over-identification test in Section 11.5.

Finite sample properties of 2SLS are model specific, see Mariano (2001) for a useful summary. One important result is that the absolute moments of positive order for 2SLS are finite up to the order of over-identification. So, for the 2SLS estimator to have a mean and variance, we need the degree of over-identification to be at least 2. This also means that for a just-identified model, no moments for 2SLS exist. For 2SLS, the absolute bias is an increasing function of the degree of over-identification. For the case of one right hand side included endogenous regressor, like equation (11.25), the size of OLS bias relative to 2SLS gets larger, the lower the degree of over-identification, the bigger the sample size, the higher the absolute value of the correlation between the disturbances and the endogenous regressor y2 and the higher the concentration parameter /о2. The latter is defined as p2 = E(y2)i(PX — PXl)E(y2)/ш2 and ш2 = var(y2t). In terms of MSE, larger values of p2 and large sample size favor 2SLS over OLS.

Another important single equation estimator is the Limited Information Maximum Likelihood (LIML) estimator which as the name suggests maximizes the likelihood function pertaining to the endogenous variables appearing in the estimated equation only. Excluded exogenous vari­ables from this equation as well as the identifiability restrictions on other equations in the system are disregarded in the likelihood maximization. For details, see Anderson and Rubin (1950). LIML is invariant to the normalization choice of the dependent variable whereas 2SLS is not. This invariancy of LIML is in the spirit of a simultaneous equation model where nor­malization should not matter. Under just-identification 2SLS and LIML are equivalent. LIML is also known as the Least Variance Ratio (LVR) method, since the LIML estimates can be obtained by minimizing a ratio of two variances or equivalently the ratio of two residual sum of squares. Using equation (11.34), one can write

y1 = yi — Yia = Xв і + ui

For a choice of ai one can compute y and regress it on Xi to get the residual sum of squares RSSi. Now regress y on Xi and X2 and compute the residual sum of squares RSS2. Equation (11.34) states that X2 does not enter the specification of that equation. In fact, this is where our identifying restrictions come from and the excluded exogenous variables that are used as instrumental variables. If these identifying restrictions are true, adding X2 to the regression of y and Xi should lead to minimal reduction in RSSi. Therefore, the LVR method finds the ai that will minimize the ratio (RSSi/RSS2). After ai is estimated, Pi is obtained from regressing y on Xi. In contrast, it can be shown that 2SLS minimizes RSSi — RSS2. For details, see Johnston (1984) or Mariano (2001). Estimator bias is less of a problem for LIML than 2SLS. In fact as the number of instruments increase with the sample size such that their ratio is a constant, Bekker (1994) shows that 2SLS becomes inconsistent while LIML remains consistent. Both estimators are special cases of the following estimator:

?i = (Zi Px Zi — 6Z’i Zi)-i(Zi Px yi — ?Zi yi)

with в = 0 yielding 2SLS, and в = the smallest eigenvalue of {(D’iDi)-iD’iPxDi} yielding LIML, where Di = [yi, Zi].

Example 3: Simple Keynesian Model

For the data from the Economic Report of the President, given in Table 5.3, consider the simple Keynesian model with no government

Ct = a + f3Yt + ut t = 12,…,T

with Yt = Ct + It.

Table 11.1 Two-Stage Least Squares

 Dependent Variable: CONSUMP Method: Two-Stage Least Squares Sample: 1959 2007 Included observations: 49 Instrument specification: INV Constant added to instrument list Variable Coefficient Std. Error t-Statistic Prob. C 4081.653 3194.839 1.277577 0.2077 Y 0.685609 0.172415 3.976513 0.0002 R-squared 0.904339 Mean dependent var 16749.10 Adjusted R-squared 0.902304 S. D. dependent var 5447.060 S. E. of regression 1702.552 Sum squared resid 1.36E+08 F-statistic 15.81265 Durbin-Watson stat 0.014554 Prob (F-statistic) 0.000240 Second-Stage SSR 1.38E+09 J-statistic 0.000000 Instrument rank 2

The OLS estimates of the consumption function yield:

Ct = —1343.31 + 0.979 Yt + residuals (219.56) (0.011)

The 2SLS estimates assuming that It is exogenous and is the only instrument available, yield

Ct = 4081.65 + 0.686 Yt + residuals (3194.8) (0.172)

Table 11.1 reports these 2SLS results using EViews. Note that the OLS estimate of the intercept is understated, while that of the slope estimate is overstated indicating positive correlation between Yt and the error as described in (11.6). The standard errors of 2SLS are bigger than those of OLS. This is always the case for an instrumental variable estimator as will be shown analytically for a simple regression in Example 4 below.

OLS on the reduced form equations yield

Ct = 12982.72+ 2.18 It + residuals and Yt = 12982.72+ 3.18 It + residuals

(3110.4) (1.74) (3110.4) (1.74)

From example (A.5) in the Appendix, we see that f3 = 312/322 = 2.18/3.18 = 0.686 as described in (A.24). Also, /3 = (322 — 1)/тг22 = (3.18 — 1)/3.18 = 2.18/3.18 = 0.686 as described in (A.25). Similarly, 3 = 3ii/322 = 32i/322 = 12982.72/3.18 = 4081.65 as described in (A.22).

This confirms that under just-identification, the 2SLS estimates of the structural coefficients are identical to the Indirect Least Squares (ILS) estimates. The latter estimates uniquely solve for the structural parameter estimates from the reduced form estimates under just-identification. Note that in this case both 2SLS and ILS estimates of the consumption equation are identical to the simple IV estimator using It as an instrument for Yt; i. e., вiv = mci/myi as shown in (A.24).