Truncation and censoring

Econometric data used in duration analysis are often panel data comprising a number of individuals observed over a fixed interval of time. Let us suppose that the survey concerns unemployment durations; the sampling period is January 2000-December 2000 and the individuals also provided information on their job history prior to January 2000. We can consider two different sampling schemes, which imply truncation and censoring.

Censoring

Let us first consider a sample drawn from the population including both employed and unemployed people, and assume at most one unemployment spell per individual. Within this sample we find persons, who:

1. are unemployed in January and remain unemployed in December too;

2. are unemployed in January and find a job before December;

3. are employed in January, lose their job before December and are still unem­ployed at this date;

4. are employed in January, next lose their job and find new employment before December.

Due to the labor force dynamics, unemployment durations of some individuals are only partially observed. For groups (2) and (4) the unemployment spells are complete, whereas they are right censored for groups (1) and (3).

To identify the right censored durations we can introduce an indicator variable d. It takes value 1 if the observed duration spell for individual i is complete, and 0 if this observation is right censored. We also denote by Ti the date of the entry into the unemployment state, by ^ the total unemployment duration and by y{ the observed unemployment duration knowing that the sampling period ends at T.

The model involves two latent variables Ti and ^. The observed variables dj and yi are related to the latent variables by:

Figure 21.1 Censoring scheme: unemployment spells

Conditional on Ti the density of the observed pair (yi, dt) is:

by substituting the hazard expression into equation (21.17). The loglikelihood function for this model can be written by assuming that individual durations are independent conditional on explanatory variables:

N

log L(y; d) = X log [12] [13]i(У; di)

i=1

N N

= X di log м yi) + X logS( yi).

i =1 i =1

Note that the duration distributions are conditioned on the date Ti. This informa­tion has generally to be introduced among the explanatory variables to correct for the so-called cohort effect.

Truncation

We can also draw the sample in the subpopulation of people who are unem­ployed in January 2000 (date T0, say). Within this sample we find persons, who:

Observed unemployment spells ———– Unobserved unemployment spells

Figure 21.2 Truncation scheme

However, we now need to take into account the endogenous selection of the sample, which only contains unemployed people at T0 (see Lung-Fei Lee, Chapter 16, in this volume). This sampling scheme is called left truncated, since compared to the previous scheme we have only retained the individuals with unemploy­ment duration larger than T0 – T. Conditional on T,, the density of the pair (y,, d,) becomes:

l,(У,, d) = f(y)dS(y)1-d,/Si(To – T).