# The Identification Problem

In general, we can think of any structural equation, say the first, as having one left hand side endogenous variable y,g right hand side endogenous variables, and k right hand side exogenous variables. The right hand side endogenous variables are correlated with the error term rendering OLS on this equation biased and inconsistent. Normally, for each endogenous variable, there exists a corresponding structural equation explaining its behavior in the model. We say that a system of simultaneous equations is complete if there are as many endogenous variables as there are equations. To correct for the simultaneous bias we need to replace the right hand side endogenous variables in this equation by variables which are highly correlated with the ones they are replacing but not correlated with the error term. Using the method of instrumental variable estimation, discussed below, we will see that these variables turn out to be the predictors obtained by regressing each right hand side endogenous variable on a subset of all the exogenous variables in the system. Let us assume that there are K exogenous variables in the simultaneous system. What set of exogenous variables should we use that would lead to consistent estimates of this structural equation? A search for the minimum set needed for consistency leads us to the order condition for identification.

The Order Condition for Identification: A necessary condition for identification of any structural equation is that the number of excluded exogenous variables from this equation are greater than or equal to the number of right hand side included endogenous variables. Let K be the number of exogenous variables in the system, then this condition requires k2 > gi, where k2 = K — ki.

Let us consider the demand and supply equations given in (11.13) and (11.14) but assume that the supply equation has in it an extra variable Wt denoting weather conditions. In this case the demand equation has one right hand side endogenous variable Pt, i. e., gi = 1 and one excluded exogenous variable Wt, making k2 = 1. Since k2 > gi, this order condition is satisfied, in other words, based on the order condition alone we cannot conclude that the demand equation is unidentified. The supply equation, however, has gi = 1 and k2 = 0, making this equation unidentified, since it does not satisfy the order condition for identification. Note that this condition is only necessary but not sufficient for identification. In other words, it is useful only if it is not satisfied, in which case the equation in question is not identified. Note that any linear combination of the new supply and demand equations would have a constant, price and weather. This looks like the supply equation but not like demand. This is why the supply equation is not identified. In order to prove once and for all whether the demand equation is identified, we need the rank condition for identification and this will be discussed in details in the Appendix to this chapter. Adding a third variable to the supply equation like the amount of fertilizer used Ft will not help the supply equation any, since a linear combination of supply and demand will still look like supply. However, it does help the identification of the demand equation. Denote by £ = k2 — gi, the degree of over-identification. In (11.13) and (11.14) both equations are unidentified (or under-identified) with £ = —1. When Wt is added to the supply equation, £ = 0 for the demand equation, and it is just-identified. When both Wt and Ft are included in the supply equation, £ = 1 and the demand equation is over-identified.

Without the use of matrices, we can describe a two-stage least squares method that will estimate the demand equation consistently. First, we run the right hand side endogenous variable Pt on a constant and Wt and get pt, then replace Pt in the demand equation with Pt and perform this second stage regression. In other words, the first step regression is

Pt = nn + n i2 Wt + vt (11.22)

with Pt = Pt — Pt satisfying the OLS normal equations £J=1 Pt = £J=1 PtWt = 0. The second stage regression is

Qt = a + (3Pt + N (11.23)

with £iPt = Y^t=iPtPt = 0. Using (11.13) and (11.23), we can write

et = e(Pt — Pt) + uit = @Pt + Uit (11.24)

t t t t t

so that £t= i et = £t=i uit and £t=i etPt = £t=i uuPt using the fact that £t= i Pt = £t=i PtPt = 0. So the new error et behaves as the original disturbance uit. However, our right hand side variable is now Pt which is independent of uit since it is a linear combination of exogenous variables only. We essentially decomposed Pt into two parts, the first part Pt is a linear combination of exogenous variables and therefore, independent of the uit’s. The second

part is Vt which is correlated with u1t. In fact, this is the source of simultaneous bias. The two parts Pt and vt are orthogonal to each other by construction. Hence when the ty’s become part of the new error et, they are orthogonal to the new regressor Pt. Furthermore, Pt is also independent of u1t.

Why would this procedure not work on the estimation of (11.13) if the model is given by equations (11.13) and (11.14). The answer is that in (11.22) we will only have a constant, and no Wt. When we try to run the second-stage regression in (11.23) the regression will fail because of perfect multicollinearity between the constant and Pt. This will happen whenever the order condition is not satisfied and the equation is not identified, see Kelejian and Oates (1989). Hence, in order for it to succeed in the second stage we need at least one excluded exogenous variable from the demand equation that is in the supply equation, i. e., variables like Wt or Ft. Therefore, whenever the second-stage regression fails because of perfect multicollinearity between the right hand side regressors, this implies that the order condition of identification is not satisfied.

In general, if we are given an equation like

yi = аі2У2 + вії Xi + в 12X2 + ui (11.25)

the order condition requires the existence of at least one exogenous variable excluded from (11.25), say X3. These extra exogenous variables like X3 usually appear in other equations of our simultaneous equation model. In the first step regression we run

У2 = ^21X1 + П22Х2 + П23 X3 + V2 (11.26)

with the OLS residuals V2 satisfying

1 Vt2tXit = 0; Yjt= 1 V2tX2t = 0; J2t=i Vt2tX3t = 0 (П.27)

and in the second step, we run the regression У1 = a 12 У 2 + в11 X1 + в 12X2 + ei

where e1 = a12(y2 — V2) + u1 = a12V2 + u1. This regression will lead to consistent estimates, because t t t t

S*=1 y2teit = J2t=i yV2tuit; J2t=1 Xiteit = Y11=1 Xituit; t=1 X2te1t = t=1 X2tu1t

and u1t is independent of the exogenous variables. In order to solve for 3 structural parameters a12, в11 and в12 one needs three linearly independent OLS normal equations. ‘t=1 V2tV1t = 0

is a new piece of information provided y2 is regressed on at least one extra variable besides X1 and X2. Otherwise^))t=1 X1tV1t = ^t=1 X2tV1t = 0 are the only two linearly independent normal equations in three structural parameters.

What happens if there is another right hand side endogenous variable, say y3? In that case (11.25) becomes У1 = а12У2 + а13У3 + в 11X1 + ei2X2 + u1

Now we need at least two exogenous variables that are excluded from (11.30) for the order condition to be satisfied, and the second stage regression to run. Otherwise, we will have less
linearly independent equations than there are structural parameters to estimate, and the second stage regression will fail. Also, y2 and y3 should be regressed on the same set of exogenous vari­ables. Furthermore, this set of second-stage regressors should always include the right hand side exogenous variables of (11.30). These two conditions will ensure consistency of the estimates. Let X3 and X4 be the excluded exogenous variables from (11.30). Our first step regression would regress y2 and y3 on Xi, X2, X3 and X4 to get y2 and уз, respectively. The second stage regression would regress y1 on y2, Уз, X1 and X2. From the first step regressions we have

y2 = У2 + У2 and уз = Уз + Уз (11.31)

where y2 and y3 are linear combinations of the X’s, and У2 and У3 are the residuals. The second stage regression has the following normal equations

і yy2tyit = Y1 T=i УзРи = T=i Xityit = T=i X2tyit = 0 (11.32)

where y denotes the residuals from the second stage regression. In fact

ei = аі2У2 + «ізУз + ui (11.33)

Now t=1 eitV2t = Zt=1 uitV2t because t=1 V2ty2t = t=1 vзtУ2t = 0. The latter holds because

y2, the predictor, is orthogonal to У2, the residual. Also, y2 is orthogonal to Уз if y2 is regressed on a set of X’s that are a subset of the regressors included in the first step regression of уз. Similarly, Ym=1 e1tУ3t = 1=1 uuy^ if уз is regressed on a set of exogenous variables that are a subset of

the X’ s included in the first step regression of y2. Combining these two conditions leads to the following fact: y2 and уз have to be regressed on the same set of exogenous variables for the composite error term to behave like the original error. Furthermore these exogenous variables should include the included X’s on the right hand side of the equation to be estimated, i. e., X1 and X2, otherwise, X)’t=1 e1tX1t is not necessarily equal to ‘t=1 u1tX1t, because X)t=1 v2tX1t or 1=1 V^Xu are not necessarily zero. For further analysis along these lines, see problem 2.