# Truncation and censoring

In some studies, inclusion in the sample requires that sampled individuals have been engaged in the activity of interest. Then the count data are truncated, as the data are observed only over part of the range of the response variable. Examples of truncated counts include the number of bus trips made per week in surveys taken on buses, the number of shopping trips made by individuals sampled at a mall, and the number of unemployment spells among a pool of unemployed. In all these cases we do not observe zero counts, so the data are said to be

zero-truncated, or more generally left-truncated. Right truncation results from loss of observations greater than some specified value.

Truncation leads to inconsistent parameter estimates unless the likelihood function is suitably modified. Consider the case of zero truncation. Let f(y | 0) denote the density function and F(y | 0) = Pr[Y < y] denote the cumulative distribution function of the discrete random variable, where 0 is a parameter vector. If realizations of y less than a positive integer 1 are omitted, the ensuing zero-truncated density is given by

f (yl0, y > 1) = 1 _ ^ , y = 1, 2, … . (15.8)

This specializes in the zero-truncated Poisson case, for example, to f(y | p, y > 1) = e~ppy/[ y!(1 – exp(-p))]. It is straightforward to construct a loglikelihood based on this density and to obtain maximum likelihood estimates.

Censored counts most commonly arise from aggregation of counts greater than some value. This is often done in survey design when the total probability mass over the aggregated values is relatively small. Censoring, like truncation, leads to inconsistent parameter estimates if the uncensored likelihood is mistakenly used.

For example, the number of events greater than some known value c might be aggregated into a single category. Then some values of y are incompletely observed; the precise value is unknown but it is known to equal or exceed c. The observed data has density

where c is known. Specialization to the Poisson, for example, is straightforward.

A related complication is that of sample selection (Terza, 1998). Then the count y is observed only when another random variable, potentially correlated with y, crosses a threshold. For example, to see a medical specialist one may first need to see a general practitioner. Treatment of count data with sample selection is a current topic of research.

## Leave a reply