Seemingly Unrelated Regressions
When asked “How did you get the idea for SUR?” Zellner responded: “On a rainy night in Seattle in about 1956 or 1957, I somehow got the idea of algebraically writing a multivariate regression model in single equation form. When I figured out how to do that, everything fell into place because then many univariate results could be carried over to apply to the multivariate system and the analysis of the multivariate system is much simplified notationally, algebraically and, conceptually.” Read the interview of Professor Arnold Zellner by Rossi (1989, p. 292).
10.1 Introduction
Consider two regression equations corresponding to two different firms
уг — Хфг + ui і — I, 2
where уг and иг are T x 1 and Хг is (T x Кг) with иг ~ (0,аггІт). OLS is BLUE on each
equation separately. Zellner’s (1962) idea is to combine these Seemingly Unrelated Regressions in one stacked model, i. e.,










which can be written as
у — Хв + и
where у’ — (уі, у2) and X and и are obtained similarly from (10.2). у and и are 2T x 1, X is 2T x (K1 + K2) and в is (K1 + K2) x 1. The stacked disturbances have a variancecovariance matrix






where E — [aj] for i, j — 1, 2; with p — a12Д/a11a22 measuring the extent of correlation between the two regression equations. The Kronecker product operator U is defined in the Appendix to Chapter 7. Some important applications of SUR models in economics include the estimation of a system of demand equations or a translog cost function along with its share equations, see Berndt (1991). Briefly, a system of demand equations explains household consumption of several commodities. The correlation among equations could be due to unobservable household specific attributes that influence the consumption of these commodities. Similarly, in estimating a cost equation along with the corresponding input share equations based on firm level data. The correlation among equations could be due to unobservable firmspecific effects that influence input choice and cost in production decisions.
B. H. Baltagi, Econometrics, Springer Texts in Business and Economics, DOI 10.1007/9783642200595_10, © SpringerVerlag Berlin Heidelberg 2011
Problem 1 asks the reader to verify that OLS on the system of two equations in (10.2) yields the same estimates as OLS on each equation in (10.1) taken separately. If p is large we expect gain in efficiency in performing GLS rather than OLS on (10.3). In this case
Pols = (X ‘Q,lX )1 X’Q1 y (10.5)
where Q1 = X1 0 It. GLS will be BLUE for the system of two equations estimated jointly. Note that we only need to invert X to obtain Q1. X is of dimension 2 x 2 whereas, Q is of dimension 2T x 2T. In fact, if we denote by X1 = [aij], then
(10.6)
Zellner (1962) gave two sufficient conditions where it does not pay to perform GLS, i. e., GLS on this system of equations turns out to be OLS on each equation separately. These are the following:
Case 1: Zero correlation among the disturbances of the ith and jth equations, i. e., aj = 0 for i = j. This means that X is diagonal which in turn implies that X1 is diagonal with aii = 1/aii for i = 1, 2, and aij = 0 for i = j. Therefore, (10.6) reduces to
(10.7)
Case 2: Same regressors across all equations. This means that all the Xi’s are the same, i. e.,
X1 = X2 = X*. This rules out different number of regressors in each equation and all the Xi’s must have the same dimension, i. e., K1 = K2 = K. Hence, X = I2 0 X* and (10.6) reduces to
Pgls = [(І2 0 X*0(X1 0 It)(І2 0 X*)]1[(І2 0 X*’)(X1 0 It)y] (10.8)
= [X 0 (X*’X*)1][(X1 0 X*’)y] = [І2 0 (X*’X*)1X*’]y = Pols
These results generalize to the case of M regression equations, but for simplicity of exposition we considered the case of two equations only.
A necessary and sufficient condition for SUR(GLS) to be equivalent to OLS, was derived by Dwivedi and Srivastava (1978). An alternative derivation based on the Milliken and Albohali (1984) necessary and sufficient condition for OLS to be equivalent to GLS, is presented here, see Baltagi (1988). In Chapter 9, we saw that GLS is equivalent to OLS, for every y, if and only if
X’Q1Px = 0 (10.9)
In this case, X = diag[Xi], Q1 = X1 0 It, and PX = diag[PXi]. Hence, the typical element of (10.9), see problem 1, is
aij X’PXj = 0 (10.10)
This is automatically satisfied for i = j. For i = j, this holds if aij = 0 or X’iPXj = 0. Note that aij = 0 is the first sufficient condition provided by Zellner (1962). The latter condition X’iPxj = 0 implies that the set of regressors in the ith equation are a perfect linear combination
of those in the jth equation. Since XjPXi = 0 has to hold also, Xj has to be a perfect linear combination of the regressors in the іth equation. Xi and Xj span the same space. Both Xi and Xj have full column rank for OLS to be feasible, hence they have to be of the same dimension for X’iPXj = XjPXi = 0. In this case, Xi = CXj, where C is a nonsingular matrix, i. e., the regressors in the гth equation are a perfect linear combination of those in the jth equation. This includes the second sufficient condition derived by Zellner (1962). In practice, different economic behavioral equations contain different number of right hand side variables. In this case, one rearranges the SUR into blocks where each block has the same number of right hand side variables. For two equations (i and j) belonging to two different blocks (i = j), (10.10) is satisfied if the corresponding aij is zero, i. e., £ has to be block diagonal. However, in this case, GLS performed on the whole system is equivalent to GLS performed on each block taken separately. Hence, (10.10) is satisfied for SUR if it is satisfied for each block taken separately.
Revankar (1974) considered the case where X2 is a subset of Xi. In this case, there is no gain in using SUR for estimating в2. In fact, problem 2 asks the reader to verify that в2,SUR = в2,OLS. However, this is not the case for в1. It is easy to show that в1 SUR = в1 OLS — Ae2,OLS, where A is a matrix defined in problem 2, and e2,OLS are the OLS residuals for the second equation.
Telser (1964) suggested an iterative least squares procedure for SUR equations. For the two equations model given in (10.1), this estimation method involves the following:
1. Compute the OLS residuals e1 and e2 from both equations.
2. Include e1 as an extra regressor in the second equation and e2 as an extra regressor in the first equation. Compute the new least squares residuals and iterate this step until convergence of the estimated coefficients. The resulting estimator has the same asymptotic distribution as Zellner’s (1962) SUR estimator.
Conniffe (1982) suggests stopping at the second step because in small samples this provides most of the improvement in precision. In fact, Conniffe (1982) argues that it may be unnecessary and even disadvantageous to calculate Zellner’s estimator proper. Extensions to multiple equations is simple. Step 1 is the same where one computes least squares residuals of every equation. Step 2 adds the residuals of all other equations in the equation of interest. OLS is run and the new residuals are computed. One can stop at this second step or iterate until convergence.
In practice, £ is not known and has to be estimated. Zellner (1962) recommended the following feasible GLS estimation procedure:
Sii = Ht=1 e2it/(T — Ki) for і = 1,2 (10.11)
and
Sij = E t=1 eitejt/(T — Ki)1/2(T — Kj )1/2 for i, j = 1,2 and і = j (10.12)
where eit denotes OLS residuals of the іth equation. sii is the s2 of the regression for the ith equation. This is unbiased for aii. However, sij for і = j is not unbiased for aj. In fact, the unbiased estimate is
‘Sij = E^ eittjt/[T — Ki — Kj + tr (B)] for i, j = 1,2 (10.13)
where B = Xi(XlXi)1 XlXj(XjXj)1Xj = PXiPXj, see problem 4. Using this last estimator may lead to a variancecovariance matrix that is not positive definite. For consistency, however, all we need is a division by T, however this leaves us with a biased estimator:
Sij = ELi eitejt/T for i, j = 1,2 (10.14)
Using this consistent estimator of £ will result in feasible GLS estimates that are asymptotically efficient. In fact, if one iterates this procedure, i. e., compute feasible GLS residuals and second round estimates of £ using these GLS residuals in (10.14), and continue iterating, until convergence, this will lead to maximum likelihood estimates of the regression coefficients, see Oberhofer and Kmenta (1974).
Relative Efficiency of OLS in the Case of Simple Regressions
To illustrate the gain in efficiency of Zellner’s SUR compared to performing OLS on each equation separately, Kmenta(1986, pp. 641643) considers the following two simple regression equations:
Fit = P11 + P 12X1t + u1t (10.15)
Y2t = в 21 + в 22X2t + U2t for t = 1, 2,…T;
and proves that
var012,GLS)/var(j312,OLS) = (1 – P2)/[1 – P2r2] (1°.16)
where p is the correlation coefficient between u1 and u2, and r is the sample correlation coefficient between X1 and X2. Problem 5 asks the reader to verify (10.16). In fact, the same relative efficiency ratio holds for в22, i. e., var(e22;GLS)/var(e22;OLS) is given by that in (10.16). This confirms the two results obtained above, namely, that as P increases this relative efficiency ratio decreases and OLS is less efficient than GLS. Also, as r increases this relative efficiency ratio increases and there is less gain in performing GLS rather than OLS. For p = 0 or r = 1, the efficiency ratio is 1, and OLS is equivalent to GLS. However, if p is large, say 0.9 and r is small, say 0.1 then (10.16) gives a relative efficiency of 0.11. For a tabulation of (10.16) for various values of p2 and r2, see Table 121 of Kmenta (1986, p. 642).
Relative Efficiency of OLS in the Case of Multiple Regressions
With more regressors in each equation, the relative efficiency story has to be modified, as indicated by Binkley and Nelson (1988). In the two equation model considered in (10.2) with K1 regressors X1 in the first equation and K2 regressors X2 in the second equation
If we focus on the regression estimates of the first equation, we get var(e 1,GLS) = A11 = [al1X’1 X1 — a12X,1 X2(o22X’2X2)1a21X!2X1]1 see problem 6. Using the fact that
1/011 —p2/v12
—p2/o21 1/022
where p2 = *22/*n*22, one gets
var031)GLs) = [*11 (1 – P2)]X Xi – p2(XlPx2X1)}1 (10.18)
Add and subtract p2X1 X1 from the expression to be inverted, one gets
var(A, gls) = *11X1 X1 + [p2/(1 – p2)]E’E}1 (10.19)
where E = P>x2 X1 is the matrix whose columns are the OLS residuals of each variable in X1 regressed on X2. If E = 0, there is no gain in SUR over OLS for the estimation of /31. X1 = X2 or X1 is a subset of X2 are two such cases. One can easily verify that (10.19) is the variancecovariance matrix of an OLS regression with regressor matrix
X1
dE where 02 = p2/(1 — p2). Now let us focus on the efficiency of the estimated coefficient of the gth variable, Xq in X1. Recall, from Chapter 4, that for the regression of y on X1
var(Pq, OLS) = *11/ T=1 Xtq (1 — R
where the denominator is the residual sum of squares of Xq on the other (K1 — 1) regressors in X1 and Rq is the corresponding R2 of that regression. Similarly, from (10.19),
var(Pq, SUR) = *11/ T=1 Xtq + 02Y1 T=1 etq (1 — Щ2 (10.21)
This variance differs from var(eq, oLS) in (10.20) by the two extra terms in the denominator. If p = 0, then 02 = 0, so that W’ = [X1, 0] and R2 = R*2. In this case, (10.22) reduces to (10.20). If Xq also appears in the second equation, or in general is spanned by the variables in X2, then etq = 0,^01=1 etq = 0 and from (10.22) there is gain in efficiency only if R2 > Rq2. R2q is a measure of multicollinearity of Xq with the other (K1 — 1) regressors in the first equation, i. e., X1. If this is high, then it is more likely for R2 > R*2. Therefore, the higher the multicollinearity within X1, the greater the potential for a decrease in variance of OLS by SUR. Note that Rt = Rq2 when dE = 0. This is true if в = 0, or E = 0. The latter occurs when X1 is spanned by the subspace of X2. Problem 7 asks the reader to verify that R2 = R*2 when X1 is orthogonal to X2. Therefore, with more regressors in each equation, one has to consider the correlation between the X’s within each equation as well as that across equations. Even when the X’s across equations are highly correlated, there may still be gains from joint estimation using SUR when there is high mulicollinearity within each equation.
10.3 Testing Diagonality of the VarianceCovariance Matrix
Since the diagonality of £ is at the heart of using SUR estimation methods, it is important to look at tests for H0: £ is diagonal. Breusch and Pagan (1980) derived a simple and easy to use Lagrange multiplier statistic for testing H0. This is based upon the sample correlation coefficients of the OLS residuals:
LM = T £"2 Ej= r2 (10.23)
where M denotes the number of equations and r} = 3}/(33nSjj)1/2. The 3} ’s are computed from OLS residuals as in (10.14). Under the null hypothesis, ALM has an asymptotic x2m(M1)/2 distribution. Note that the 3}’s are needed for feasible GLS estimation. Therefore, it is easy to compute the Г}’s and ALM by summing the squares of half the number of offdiagonal elements of R = [rj] and multiplying the sum by T. For example, for the two equations case, ALM = Tr^1 which is asymptotically distributed as Xi under H0. For the three equations case, ALM = T(r2,1 + r31 + r^2) which is asymptotically distributed as x3 under H0.
Alternatively, the Likelihood Ratio test can also be used to test for diagonality of £. This is based on the determinants of the variance covariance matrices estimated by MLE for the restricted and unrestricted models:
Alr = T (£M:1 log3ii – log£) (10.24)
where 33ц is the restricted MLE of ац obtained from the OLS residuals as in (10.14). The matrix £ denotes the unrestricted MLE of £. This may be adequately approximated with an estimator based on the feasible GLS estimator f3poLS, see Judge et al. (1982). Under H0, ALR has an asymptotic xM(M1)/2 distribution.
Leave a reply