# The Fixed Effects Model

If the Hi’s are thought of as fixed parameters to be estimated, then equation (12.1) becomes

yit = a + Х’ив + S i=1 HiDi + v it (12.5)

where Di is a dummy variable for the i-th household. Not all the dummies are included so as not to fall in the dummy variable trap. One is usually dropped or equivalently, we can say that there is a restriction on the h’s given by ^N=1 Hi = 0. The vit’s are the usual classical IID random variables with 0 mean and variance al. OLS on equation (12.5) is BLUE, but we have two problems, the first is the loss of degrees of freedom since in this case, we are estimating N + K parameters. Also, with a lot of dummies we could be running into multicollinearity problems and a large X’X matrix to invert. For example, if N = 50 states, T = 10 years and we have two explanatory variables, then with 500 observations we are estimating 52 parameters. Alternatively, we can think of this in an analysis of variance context and rearrange our observations, say, on y in an (N x T) matrix where rows denote firms and columns denote time periods.

1 |
2 |
t .. T |
||

1 |
Vii |
У12 |
■■ ViT |
Уі. |

i 2 |
V21 |
y22 |
■■ V2T |
У2. |

N |
Vn і |
Vn 2 |
■■ у NT |
Vn. |

where yi. = ^ J=1 yu and yi. = yi./T. For the simple regression with one regressor, the model given in (12.1) becomes

yit = a + /3xu + цi + vu (12.6)

averaging over time gives

Уі. = a + (3Xi. + Ці + Vi. (12.7)

and averaging over all observations gives

y.. = a + (3x.. + v.. (12.8)

where у.. = Y1 і=і Xa=i yit/NT. Equation (12.8) follows because the Hi’s sum to zero. Defining Vit = (yit – Уі.) and Xit and vu similarly, we get

yit – Уі. = в(хи – Xi) + (vit – Vi.)

or

Vit = Pxit + и it (12.9)

Running OLS on equation (12.9) leads to the same estimator of в as that obtained from equation (12.5). This is called the least squares dummy variable estimator (LSDV) or в in our notation. It is also known as the Within estimator since ^N1 ^T=1 Xft is the within sum of squares in an analysis of variance framework. One can then retrieve an estimate of a from equation (12.8) as a = у.. – /Зх… Similarly, if we are interested in the yi’s, those can also be retrieved from (12.7) and (12.8) as follows:

Vi = (Уі. – У..) – в(хі. – x..) (12-10)

In matrix form, one can substitute the disturbances given by (12.4) into (12.3) to get

у = aiNT + Xв + Z^h + v = ZS + Z^h + v (12.11)

and then perform OLS on (12.11) to get estimates of a, в and ц. Note that Z is NT x (K + 1) and ZM, the matrix of individual dummies is NT x N. If N is large, (12.11) will include too

many individual dummies, and the matrix to be inverted by OLS is large and of dimension (N + K). In fact, since a and в are the parameters of interest, one can obtain the least squares dummy variables (LSDV) estimator from (12.11), by residualing out the variables, i. e., by premultiplying the model by Q, the orthogonal projection of ZM, and performing OLS

Qy = QXe + Qv (12.12)

This uses the fact that QZ^ = Qint = 0, since PZ^ = ZM. In other words, the Q matrix wipes out the individual effects. Recall, the FWL Theorem in Chapter 7. This is a regression of y = Qy with typical element (yit — y^) on X = QX with typical element (Xit, k — Xj,,k) for the k-th regressor, k = 1,2,…,K. This involves the inversion of a (K x K) matrix rather than (N + K) x (N + K) as in (12.11). The resulting OLS estimator is

y = (X ‘QX )-1X ‘Qy (12.13)

with var(/3) = al(X’QX)-1 = of,(XX)-1.

Note that this fixed effects (FE) estimator cannot estimate the effect of any time-invariant variable like sex, race, religion, schooling, or union participation. These time-invariant variables are wiped out by the Q transformation, the deviations from means transformation. Alternatively, one can see that these time-invariant variables are spanned by the individual dummies in (12.5) and therefore any regression package attempting (12.5) will fail, signaling perfect mul – ticollinearity. If (12.5) is the true model, LSDV is BLUE as long as vit is the standard classical disturbance with mean 0 and variance covariance matrix oV, Int. Note that as T -^ж, the FE estimator is consistent. However, if T is fixed and N ^raas typical in short labor panels, then only the FE estimator of в is consistent, the FE estimators of the individual effects (a + gf) are not consistent since the number of these parameters increase as N increases.

Testing for Fixed Effects: One could test the joint significance of these dummies, i. e., Ho; g1 = g2 = .. = UN-1 = 0, by performing an F-test. This is a simple Chow test given in (4.17) with the restricted residual sums of squares (RRSS) being that of OLS on the pooled model and the unrestricted residual sums of squares (URSS) being that of the LSDV regression. If N is large, one can perform the within transformation and use that residual sum of squares as the URSS. In this case

Computational Warning: One computational caution for those using the Within regression given by (12.12). The s2 of this regression as obtained from a typical regression package divides the residual sums of squares by NT — K since the intercept and the dummies are not included. The proper s2, say s*2 from the LSDV regression in (12.5) would divide the same residual sums of squares by N(T — 1) — K. Therefore, one has to adjust the variances obtained from the within regression (12.12) by multiplying the variance-covariance matrix by (s*2/s2) or simply by multiplying by [NT — K]/[N(T — 1) — K].

12.2.1 The Random Effects Model

There are too many parameters in the fixed effects model and the loss of degrees of freedom can be avoided if the gfs can be assumed random. In this case ^ IID(0,a2), Vit – IID(0,a2)

and the y, i’s are independent of the vit’s. In addition, the Xu’s are independent of the ^’s and Pit’s for all i and t. The random effects model is an appropriate specification if we are drawing N individuals randomly from a large population.

This specification implies a homoskedastic variance var(uit) = a2^ + aV for all i and t, and an equi-correlated block-diagonal covariance matrix which exhibits serial correlation over time only between the disturbances of the same individual. In fact,

cov(uit, ujs) = a2^ + a2V for i = j, t = s (12.15)

= al for i = j, t = s

and zero otherwise. This also means that the correlation coefficient between uit and Ujs is

p = correl(uit, ujs) = 1 for i = j, t = s (12.16)

= al/(al + al) for i = j, t = s

and zero otherwise. From (12.4), one can compute the variance-covariance matrix

Q = E(uu’) = Z^E(yy!)Z’^ + E(vv’) = a‘jl(lN Z Jt) + a2v(In Z It) (12.17)

In order to obtain the GLS estimator of the regression coefficients, we need Q-1. This is a huge matrix for typical panels and is of dimension (NT x NT). No brute force inversion should be attempted even if the researcher ’ s application has a small N and T. For example, if we observe N = 20 firms over T = 5 time periods, Q will be 100 by 100. We will follow a simple trick devised by Wansbeek and Kapteyn (1982) that allows the deviation of Q-1 and Q-1/2. Essentially, one replaces Jt by TJt, and It by (ET + Jt) where ET is by definition (It — Jt). In this case:

Q = Ta‘fl(lN Z Jt ) + al (In Z Et ) + a2v (In Z Jt )

collecting terms with the same matrices, we get

Q = (Ta^ + aV)(In Z Jt) + a2(In Z Et) = aP + a2Q (12.18)

where af = Ta2^ + al. (12.18) is the spectral decomposition representation of Q, with af being the first unique characteristic root of Q of multiplicity N and a2 is the second unique characteristic root of Q of multiplicity N(T — 1). It is easy to verify, using the properties of P and Q, that

Q-1 = 12 P + 12 Q (12.19)

ai ai

and

Q-1/2 = — P + — Q (12.20)

a1 av

In fact, Qr = (af)rP + (aV)rQ where r is an arbitrary scalar. Now we can obtain GLS as a weighted least squares. Fuller and Battese (1974) suggested premultiplying the regression equation given in (12.3) by avQ-1/2 = Q + (av/a1 )P and performing OLS on the resulting transformed regression. In this case, y* = avQ-1/2y has a typical element yit — Qyj,. where в = 1 — (av/a1). This transformed regression inverts a matrix of dimension (K + 1) and can be easily implemented using any regression package.

The Best Quadratic Unbiased (BQU) estimators of the variance components arise naturally from the spectral decomposition of Q. In fact, Pu ~ (0, aP) and Qu ~ (0, a2Q) and

provide the BQU estimators of a and a"2, respectively, see Balestra (1973).

These are analysis of variance type estimators of the variance components and are MVU under normality of the disturbances, see Graybill (1961). The true disturbances are not known and therefore (12.21) and (12.22) are not feasible. Wallace and Hussain (1969) suggest substituting OLS residuals uOLS instead of the true u’s. After all, the OLS estimates are still unbiased and consistent, but no longer efficient. Amemiya (1971) shows that these estimators of the variance components have a different asymptotic distribution from that knowing the true disturbances. H^suggests using the LSDV residuals instead of the OLS residuals. In this case її = y — aiNT — Хв where a = y.. — X’.в and X[ is a 1 x K vector of averages of all regressors. Substituting these «’s for u in (12.21) and (12.22) we get the Amemiya-type estimators of the variance components. The resulting estimates of the variance components have the same asymptotic distribution as that knowing the true disturbances.

Swamy and Arora (1972) suggest running two regressions to get estimates of the variance components from the corresponding mean square errors of these regressions. The first regression is the Within regression, given in (12.12), which yields the following s2:

2

av = y Qy — y’QX (X’QX )-1X’Qy]/[N (T — 1) — K]

The second regression is the Between regression which runs the regression of averages across time, i. e.,

yi. = a + X’L в + ui. i = 1,…,N (12.24)

This is equivalent to premultiplying the model in (12.11) by P and running OLS. The only caution is that the latter regression has NT observations because it repeats the averages T times for each individual, while the cross-section regression in (12.24) is based on N observations. To remedy this, one can run the cross-section regression

yi./у/T = a(y/T) + (X’jVT)e + ui./y/T (12.25)

where one can easily verify that var(ui.//T) = al. This regression will yield an s2 given by

*1 = (y’Py — y’PZ(ZlPZ)-1ZlPy)/(N — K — 1) (12.26)

Note that stacking the following two transformed regressions we just performed yields

and the transformed error has mean 0 and variance-covariance matrix given by

( aV Q 0 A

V 0 aP )

Problem 6 asks the reader to verify that OLS on this system of 2NT observations yields OLS on the pooled model (12.3). Also, GLS on this system yields GLS on (12.3). Alternatively, one could get rid of the constant a by running the following stacked regressions:

(Qy = ( QX ( Qu

V (P — JnT)y (P – JnT)X ) V (P _ J NT)u

This follows from the fact the QinT = 0 and (P — JnT)inT = 0. The transformed error has zero mean and variance-covariance matrix

/ aVQ 0

V 0 a(P — J nt )

OLS on this system, yields OLS on (12.3) and GLS on (12.28) yields GLS on (12.3). In fact,

Pols = [(X’QX/al )+X'(P — Jnt )X/a]-l[(X ‘Qy/al ) + (X'(P — Jnt )y/a)]

= [Wxx + ф2 Bxx-1[Wxy + ф2 Bxy] (12.30)

with var(pGLS) = aV[Wxx + ф2Вхх]-1. Note that Wxx = X’QX, Bxx = X'(P — Jnt)X and ф2 = aV/a. Also, the Within estimator of J is pWithin = W-XWXy and the Between estimator PBetween = B—1XBXy. This shows that POLS is a matrix weighted average of pWithin and PBetween weighing each estimate by the inverse of its corresponding variance. In fact

e OLS = WlPwithin + WlP Between (12.31)

where Wi = [Wxx + ф2Bxx]-1Wxx and Wi = [Wxx + ф2Вхх]-1(ф2Вхх) = I — Wi. This was demonstrated by Maddala (1971). Note that (i) if a^ = 0, then ф2 = 1 and /3OLS reduces

to ‘Pols. (ii) If T ^ to, then ф2 ^ 0 and Jols tends to в within (m) If ф2 ^ ^, then Jols

tends to PBetween. In other words, the Within estimator ignores the between variation, and the Between estimator ignores the within variation. The OLS estimator gives equal weight to the between and within variations. From (12.30), it is clear that var(eWithin)— var(eOLS) is a positive semi-definit^matrix, since ф2 is positive. However as T for any fixed N, ф2 ^ 0

and both POLS and PWithin have the same asymptotic variance.

Another estimator of the variance components was suggested by Nerlove (1971). His suggestion is to estimate P^ = ^2N=1(pi — p)2/(N — 1) where Jii are the dummy coefficients estimates from the LSDV regression. р^) is estimated from the within residual sums of squares divided by NT without correction for degrees of freedom.

Note that, except for Nerlove’s (1971) method, one has to retrieve Pj; as (p1 — P2)/T. In this case, there is no guarantee that the estimate of P^ would be non-negative. Searle (1971) has an extensive discussion of the problem of negative estimates of the variance components in the biometrics literature. One solution is to replace these negative estimates by zero. This in fact is the suggestion of the Monte Carlo study by Maddala and Mount (1973). This study finds that negative estimates occurred only when the true awas small and close to zero. In these cases

OLS is still a viable estimator. Therefore, replacing negative a2^ by zero is not a bad sin after all, and the problem is dismissed as not being serious.

Under the random effects model, GLS based on the true variance components is BLUE, and all the feasible GLS estimators considered are asymptotically efficient as either N or T -^ж. Maddala and Mount (1973) compared OLS, Within, Between, feasible GLS methods, true GLS and MLE using their Monte Carlo study. They found little to choose among the various feasible GLS estimators in small samples and argued in favor of methods that were easier to compute.

Taylor (1980) derived exact finite sample results for the one-way error components model. He compared the Within estimator with the Swamy-Arora feasible GLS estimator. He found the following important results: (1) Feasible GLS is more efficient than FE for all but the fewest degrees of freedom. (2) The variance of feasible GLS is never more than 17% above the Cramer – Rao lower bound. (3) More efficient estimators of the variance components do not necessarily yield more efficient feasible GLS estimators. These finite sample results are confirmed by the Monte Carlo experiments carried out by Maddala and Mount (1973) and Baltagi (1981).

## Leave a reply