Time Series, Multivariate and Panel Data
In this section we very briefly present extension from cross section to other types of count data (see Cameron and Trivedi, 1998, for further detail). For time series and multivariate count data many models have been proposed but preferred methods have not yet been established. For panel data there is more agreement in the econometrics literature on which methods to use, though a wider range of models is considered in the statistics literature.
If a time series of count data is generated by a Poisson point process then event occurrences in successive time intervals are independent. Independence is a reasonable assumption when the underlying stochastic process for events, conditional on covariates, has no memory. Then there is no need for special time series models. For example, the number of deaths (or births) in a region may be uncorrelated over time. At the same time the population, which cumulates births and deaths, will be very highly correlated over time.
The first step for time series count data is therefore to test for serial correlation. A simple test first estimates a count regression such as Poisson, obtains the residual, usually (yt – exp(xt’P)) where xt may include time trends, and tests for zero correlation between current and lagged residuals, allowing for the complication that the residuals will certainly be heteroskedastic.
Upon establishing the data are indeed serially correlated, there are several models to choose from. An aesthetically appealing model is the INAR(1) model (integer autoregressive model of order one and its generalization to the negative binomial and to higher orders of serial correlation. This model specifies yt = pt ° yt-1 + et, where pt is a correlation parameter with 0 < pt < 1, for example pt = 1/[1 + exp(-zjy)]. The symbol ° denotes the binomial thinning operator, whereby pt ° yt-1 is the realized value of a binomial random variable with probability of success pt in each of yt-1 trials. One may think of each event as having a replication or survival probability of pt in the following period. As in a linear first-order Markov model, this probability decays geometrically. A Poisson INAR(1) model, with a Poisson marginal distribution for yt arises when et is Poisson distributed with mean, say, exp(xjP). A negative binomial INAR(1) model arises if et is negative binomial distributed.
An autoregressive model, or Markov model, is a simple adjustment to the earlier cross section count models that directly enters lagged values of y into the formula for the conditional mean of current y. For example, we might suppose yt conditional on current and past xt and past yt is Poisson distributed with mean exp(xt’P + p ln y*-1), where y*-1 is an adjustment to ensure a nonzero lagged value, such as y*-1 = (yt-1 + 0.5) or y*-1 = max(0.5, yt-1).
Serially correlated error models induce time series correlation by introducing unobserved heterogeneity, see Section 3.1, and allowing this to be serially correlated. For example, yt is Poisson distributed with mean exp(xt’P)vt where vt is a serially correlated random variable (Zeger, 1988).
State space models or time-varying parameters models allow the conditional mean to be a random variable drawn from a distribution whose parameters evolve over time. For example, yt is Poisson distributed with mean pt where pt is a draw from a gamma distribution (Harvey and Fernandes, 1989).
Hidden Markov models specify different parametric models in different regimes, and induce serial correlation by specifying the stochastic process determining which regime currently applies to be an unobserved Markov process (MacDonald and Zucchini, 1997).