# Overdispersion

The Poisson regression model is usually too restrictive for count data, leading to alternative models as presented in Sections 3 and 4. The fundamental problem is that the distribution is parameterized in terms of a single scalar parameter (p) so that all moments of y are a function of p. By contrast the normal distribution has separate parameters for location (p) and scale (a2). (For the same reason the one-parameter exponential is too restrictive for duration data and more general two-parameter distributions such as the Weibull are superior. Note that this complication does not arise with binary data. Then the distribution is clearly the

one-parameter Bernoulli, as if the probability of success is p then the probability of failure must be 1 – p. For binary data the issue is instead how to parameterize p in terms of regressors.)

One way this restrictiveness manifests itself is that in many applications a Poisson density predicts the probability of a zero count to be considerably less than is actually observed in the sample. This is termed the excess zeros problem, as there are more zeros in the data than the Poisson predicts.

A second and more obvious way that the Poisson is deficient is that for count data the variance usually exceeds the mean, a feature called overdispersion. The Poisson instead implies equality of variance of mean, see (15.2), a property called equidispersion.

Overdispersion has qualitatively similar consequences to the failure of the assumption of homoskedasticity in the linear regression model. Provided the conditional mean is correctly specified, that is (15.3) holds, the Poisson MLE is still consistent. This is clear from inspection of the first-order conditions (15.5), since the left-hand side of (15.5) will have an expected value of zero if E[y{ |x;] = exp(x(P). (This consistency property applies more generally to the quasi-MLE when the specified density is in the linear exponential family (LEF). Both Poisson and normal are members of the LEF.) It is nonetheless important to control for overdispersion for two reasons. First, in more complicated settings such as with truncation and censoring, overdispersion leads to the more fundamental problem of inconsistency. Second, even in the simplest settings large overdispersion leads to grossly deflated standard errors and grossly inflated f-statistics in the usual ML output.

A statistical test of overdispersion is therefore highly desirable after running a Poisson regression. Most count models with overdispersion specify overdispersion to be of the form

V [ y,-|x;] = p; + ag(p-), (15.10)

where a is an unknown parameter and g() is a known function, most commonly g(p) = p2 or g(p) = p. It is assumed that under both null and alternative hypotheses the mean is correctly specified as, for example, exp(x)P), while under the null hypothesis a = 0 so that V[ yf|xf] = pf. A simple test statistic for H0 : a = 0 versus H1 : a Ф 0 or H1 : a > 0 can be computed by estimating the Poisson model, constructing fitted values { = exp(x( S) and running the auxiliary OLS regression (without constant)

(yi – {)2 – yi = a+ u,

{i {i

where ui is an error term. The reported f-statistic for a is asymptotically normal under the null hypothesis of no overdispersion. This test can also be used for underdispersion, in which case the conditional variance is less than the conditional mean.

## Leave a reply