Pooling Time-Series of Cross-Section Data
In this chapter, we will consider pooling time-series of cross-sections. This may be a panel of households or firms or simply countries or states followed over time. Two well known examples of panel data in the U. S. are the Panel Study of Income Dynamics (PSID) and the National Longitudinal Survey (NLS). The PSID began in 1968 with 4802 families, including an over-sampling of poor households. Annual interviews were conducted and socioeconomic characteristics of each of the families and of roughly 31000 individuals who have been in these or derivative families were recorded. The list of variables collected is over 5000. The NLS, followed five distinct segments of the labor force. The original samples include 5020 older men, 5225 young men, 5083 mature women, 5159 young women and 12686 youths. There was an over-sampling of blacks, hispanics, poor whites and military in the youths survey. The list of variables collected runs into the thousands. An inventory of national studies using panel data is given at http://www. isr. umich. edu/src/psid/panelstudies. html. Pooling this data gives a richer source of variation which allows for more efficient estimation of the parameters. With additional, more informative data, one can get more reliable estimates and test more sophisticated behavioral models with less restrictive assumptions. Another advantage of panel data sets are their ability to control for individual heterogeneity. Not controlling for these unobserved individual specific effects leads to bias in the resulting estimates. Panel data sets are also better able to identify and estimate effects that are simply not detectable in pure cross-sections or pure time – series data. In particular, panel data sets are better able to study complex issues of dynamic behavior. For example, with a cross-section data set one can estimate the rate of unemployment at a particular point in time. Repeated cross-sections can show how this proportion changes over time. Only panel data sets can estimate what proportion of those who are unemployed in one period remain unemployed in another period. Some of the benefits and limitations of using panel data sets are listed in Hsiao (2003) and Baltagi (2008). Section 12.2 studies the error components model focusing on fixed effects, random effects and maximum likelihood estimation. Section 12.3 considers the question of prediction in a random effects model, while Section 12.4 illustrates the estimation methods using an empirical example. Section 12.5 considers testing the poolability assumption, the existence of random individual effects and the consistency of the random effects estimator using a Hausman test. Section 12.6 studies the dynamic panel data model and illustrates the methods used with an empirical example. Section 12.7 concludes with a short presentation of program evaluation and the difference-in-differences estimator.