# Individual Fixed Effects

One of the oldest questions in Labor Economics is the connection between union membership and wages. Do workers whose wages are set by collective bargaining earn more because of this, or would they earn more anyway? (Perhaps because they are more experienced or skilled). To set this question up, let Yit equal the (log) earnings of worker i at time t and let Dit denote his union status. The observed Yit is either Yoit or Yiit, depending on union status. Suppose further that

E(y0 it I Dit) — E(Y0itAi;

i. e. union status is as good as randomly assigned conditional on unobserved worker ability, Ai, and other observed covariates Xu, like age and schooling.

The key to fixed-effects estimation is the assumption that the unobserved Ai appears without a time subscript in a linear model for E(Yoit|Ai, Xit, t) :

E (y oit1 Ai, Xit; t) — a + At + Ai7 + XitS,

Finally, we assume that the causal effect of union membership is additive and constant:

E(ylit |Ai Xit;t) — E(yoit1 Ai; Xit; t) + P-

This implies

e(yit1 Ai; Xit; t, Dit) — a + At + pDit + Ai7 + XitS, (5.[81].2)

where p is the causal effect of interest. The set of assumptions leading to (5.1.2) is more restrictive those we used to motivate regression in Chapter 3; we need the linear, additive functional form to make headway on the problem of unobserved confounders using panel data with no instruments.1 Equation (5.1.2) implies

Y it — ai + At + pDit + XitS + Sit – (5.1.3)

where

ai = a + Ai7-

This is a fixed-effects model. Given panel data, i. e., repeated observations on individuals, the causal effect of union status on wages can be estimated by treating ai, the fixed effect, as a parameter to be estimated. The year effect, At, is also treated as a parameter to be estimated. The unobserved individual effects are coefficients on dummies for each individual while the year effects are coefficients on time dummies.[82]

It might seem like there are an awful lot of parameters to be estimated in the fixed effects model. For

example, the Panel Survey of Income Dynamics, a widely-used panel data set, includes about 5,000 working – age men observed for about 20 years. So there are roughly 5,000 fixed effects. In practice, however, this doesn’t matter. Treating the individual effects as parameters to be estimated is algebraically the same as estimation in deviations from means. In other words, first we calculate the individual averages

Yi — + A + pdi + XtS + £{.

Subtracting this from (5.1.3) gives

Yit ~ Yi — At — A + P (dit _ Di) + (Xit — S + ("it — "i); (5.1.4)

so deviations from means kills the unobserved individual effects.[83]

An alternative to deviations from means is differencing. In other words, we estimate,

Ay it — AAt + pADit + A Xit[84] + A Sit, (5.1.5)

where the A prefix denotes the change from one year to the next. For example, AYit —Yit—Yit_i. With two periods, differencing is algebraically the same as deviations from means, but not otherwise. Both should work, although with homoskedastic and serially uncorrelated Sit deviations from means is more efficient. You might find differencing more convenient if you have to do it by hand, though the differenced standard errors should be adjusted for the fact that the differenced residuals are serially correlated.

Some regression packages automate the deviations-from-means estimator, with an appropriate standard – error adjustment for the degrees of freedoms lost in estimating N individual means. This is all that’s needed to get the standard errors right with a homoskedastic, serially uncorrelated residual. The deviations-from – means estimator has many names, including the "within estimator" and "analysis of covariance". Estimation in deviations-from-means form is also called absorbing the fixed effects.[85]

Freeman (1984) uses four data sets to estimate union wage effects under the assumption that selection into union status is based on unobserved-but-fixed individual characteristics. Table 5.1.1 displays some of his estimates. For each data set, the table displays results from a fixed-effects estimator and the corresponding cross-section estimates. The cross section estimates are typically higher (ranging from.15-.25) than the

fixed effects estimates (ranging from.10-.20). This may indicate positive selection bias in the cross-section estimates, though selection bias is not the only explanation for the lower fixed-effects estimates.

Table 5.1.1: Estimated effects of union status on log wages
Notes: Adapted from Freeman (1984). The table reports cross-section and panel |

estimates of the union relative wage effect. The estimates were calculated using the surveys listed at left. The cross-section estimates include controls for demographic and human capital variables.

Although they control for a certain type of omitted variable, fixed-effects estimates are notoriously susceptible to attenuation bias from measurement error. On one hand, economic variables like union status tend to be persistent (a worker who is a union member this year is most likely a union member next year). On the other hand, measurement error often changes from year-to-year (union status may be misreported or miscoded this year but not next year). Therefore, while union status may be misreported or miscoded for only a few workers in any single year, the observed year-to-year changes in union status may be mostly noise. In other words, there is more measurement error in the regressors in an equation like (5.1.5) or (5.1.4) than in the levels of the regressors. This fact may account for smaller fixed-effects estimates.[86]

A variant on the measurement-error problem arises from that fact that the differencing and deviations – from-means estimators used to control for fixed effects typically remove both good and bad variation. In other words, these transformations may kill some of the omitted-variables-bias bathwater, but they also remove much of the useful information in the baby – the variable of interest. An example is the use of twins to estimate the causal effect of schooling on wages. Although there is no time dimension to this problem, the basic idea is the same as the union problem discussed above: twins have similar but largely unobserved family and genetic backgrounds. We can therefore control for their common family background by including a family fixed effect in samples of pairs of twins.

Ashenfelter and Krueger (1994) and Ashenfelter and Rouse (1998) estimate the returns to schooling using samples of twins, controlling for family fixed effects. Because there are two twins from each family, this is the same as regressing differences in earnings within twin-pairs on differences in their schooling. Surprisingly, the with-family estimates come our larger than OLS. But how do differences in schooling come about between individuals who are otherwise so much alike? Bound and Solon (1999) point out that there are small differences between twins, with first-borns typically having higher birth weight and higher IQ scores (here differences in birth timing are measured in minutes). While these within-twin differences are not large, neither is the difference in their schooling. Hence, a small amount of unobserved ability differences among twins could be responsible for substantial bias in the resulting estimates.

## Leave a reply