DURATION MODEL
The duration model purports to explain the distribution function of a duration variable as a function of independent variables. The duration variable may be human life, how long a patient lives after an operation, the life of a machine, or the duration of unemployment. As is evident from these examples, the duration model is useful in many disciplines, including medicine, engineering, and economics. Introductory books on duration analysis emphasizing each of the areas of application mentioned above are Kalbfleisch and Prentice (1980), Miller (1981), and Lancaster (1990).
We shall initially explain the basic facts about the duration model in the setting of the i. i.d. sample, then later introduce the independent variables.
Denoting the duration variable by T, we can completely characterize the duration model in the i. i.d. case by specifying the distribution function
(13.7.1) F(t) = P(T < t).
In duration analysis the concept known as hazard plays an important role. We define (13.7.2) Hazard(t, t + At) = P(t < T < t + At T > t)
and call it the hazard of the interval (t, t + At). If T refers to the life of a person, the above signifies the probability that she dies in the time interval (t, t + At), given that she has lived up to time t. Assuming that the density function f(t) exists to simplify the analysis, we have from (13.7.2)
where the approximation gets better as At gets smaller. We define the hazard function, denoted {t), by
There is a onetoone correspondence between F(t) and X(t). Since f(t) = dF(t)/dt, (13.7.4) shows thatX(t) is known once F(t) is known. The next equation shows the converse:
Therefore X(t) contains no new information beyond what is contained in F(t). Nevertheless, it is useful to define this concept because sometimes the researcher has a better feel for the hazard function than for the distribution function; hence it is easier for him to specify the former than the latter.
The simplest duration model is the one for which the hazard function is constant:
(13.7.6) X(t) = X.
This is called the exponential model. From (13.7.5) we have for this model F{t) = 1 — e Xt and f(t) = Xe Kt. This model would not be realistic to use for human life, for it would imply that the probability a person dies within the next minute, say, is the same for persons of every age. The exponential model for the life of a machine implies that the machine is always like new, regardless of how old it may be. A more realistic model for human life would be the one in which X(t) has a U shape, remaining high for age 0 to 1, attaining a minimum at youth, and then rising again with age. For some other applications (for example, the duration of a marriage) an inverted U shape may be more realistic.
The simplest generalization of the exponential model is the Weibull model, in which the hazard function is specified as
(13.7.7) X(t) = A. a<“_1.
When a = 1, the Weibull model is reduced to the exponential model. Therefore, the researcher can test exponential versus Weibull by testing a = 1 in the Weibull model. Differentiating (13.7.7) with respect to t, we obtain
> dX>
(13.7.8) a = 1 <=> — = 0.
< dt <
Thus the Weibull model can accommodate an increasing or decreasing
hazard function, but neither a Ushaped nor an inverted Ushaped hazard function.
Lancaster (1979) estimated a Weibull model of unemployment duration. He introduced independent variables into the model by specifying the hazard function of the ith unemployed worker as
(13.7.9) i(t) = ехр(хгр)аі“
The vector хг contains log age, log unemployment rate of the area, and log replacement (unemployment benefit divided by earnings from the last job). Lancaster was interested in testing а = 1, because economic theory does not clearly indicate whether a should be larger or smaller than 1. He found, curiously, that his maximum likelihood estimator of a approached 1 from below as he kept adding the independent variables, starting with the constant term only.
As Lancaster showed, this phenomenon is due to the fact that even if the hazard function is constant over time for each individual, if different individuals are associated with different levels of the hazard function, an aggregate estimate of the hazard function obtained by treating all the individuals homogeneously will exhibit a declining hazard function (that is, dX/dt < 0). We explain this fact by the illustrative example in Table 13.1. In this example three groups of individuals are associated with three levels of the hazard rate—0.5, 0.2, and 0.1. Initially there are 1000 people in each group. The first row shows, for example, that 500 people remain at the end of period 1 and the beginning of period 2, and so on. The last row indicates the ratio of the aggregate number of people who die in each period to the number of people who remain at the beginning of the period.
The heterogeneity of the sample may not be totally explained by all the independent variables that the researcher can observe. In such a case it would be advisable to introduce into the model an unobservable random variable, known as the unobserved heterogeneity, which acts as a surrogate for the omitted independent variables.
In one of his models Lancaster (1979) specified the hazard function as
(13.7.10) Xi(t) = exp(x,’p + г/,)а£“1,
where {Vi are i. i.d. gamma. If L,(u,) denotes the conditional likelihood function for the ixh person, given v„ the likelihood function of the model with the unobserved heterogeneity is given by
71
(13.7.11) L=Y[ELivt),
І— 1
table 13і An illustrative example of a declining aggregate hazard

where the expectation is taken with respect to the distribution of v{. (The likelihood function of the model without the unobserved heterogeneity will be given later.) As Lancaster introduced the unobserved heterogeneity, his estimate of a further approached 1. The unobserved heterogeneity can be used with a model more general than Weibull. Heckman and Singer (1984) studied the properties of the maximum likelihood estimator of the distribution of the unobserved heterogeneity without parametrically specifying it in a general duration model. They showed that the maximum likelihood estimator of the distribution is discrete.
A hazard function with independent variables may be written as
(13.7.12) Mt) = k0(<)exp(Xj(P).
where o(t) is referred to as the baseline hazard function. This formulation is more general than (13.7.9), first, in the sense that x depends on time t as well as on individual i, and, second, in the sense that the baseline hazard function is general. Some examples of the baseline hazard functions which have been used in econometric applications are as follows:
Flinn and Heckman (1982)
0 ktk~l
(13.7.14) А.0(г) = ——— • Gritz (1993)
1 + p Г
(13.7.15) X.0(t) = A exp(7i< + 72<2). Sturm (1991)
Next we consider the derivation of the likelihood function of the duration model with the hazard function of the form (13.7.12). The first step is to obtain the distribution function by the formula (13.7.5) as
AoCO exp(x’sP)di о
and then the density function, by differentiating the above as
(13.7.17) fit) = A0(0 exp(x’,P) exp [ —exp(x’sp)4s •
The computation of the integral in the above two formulae presents a problem in that we must specify the independent variable vector x„ as a
continuous function of s. It is customary in practice to divide the sample period into intervals and assume that x„ remains constant within each interval. This assumption simplifes the integral considerably.
The likelihood function depends on a sampling scheme. As an illustration, let us assume that our data consist of the survival durations of all those who had heart transplant operations at Stanford University from the day of the first such operation there until December 31, 1992. There are two categories of data: those who died before December 31, 1992, and those who were still living on that date. The contribution of a patient in the first category to the likelihood function is the density function evaluated at the observed survival duration, and the contribution of a patient in the second category is the probability that he lived at least until December 31, 1992. Thus the likelihood function is given by
(13.7.18) L = n/iWn [1 ~Fm, о і
where П0 is the product over those individuals who died before December 31, 1992, and Щ is the product over those individuals who were still living on that date. Note that for patients of the first category Ц refers to the time from the operation to the death, whereas for patients of the second category Ц refers to the time from the operation to December 31, 1992. The survival durations of the patients still living on the last day of observation (in this example December 31, 1992) are said to be right censored.
Note a similarity between the above likelihood function and the likelihood function of the Tobit model given in (13.6.3). In fact, the two models are mathematically equivalent.
Now consider another sampling scheme with the same heart transplant data. Suppose we observe only those patients who either had their operations between January 1, 1980, and December 31, 1992, or those who had their operations before January 1, 1980, but were still living on that date. Then (13.7.18) is no longer the correct likelihood function. Maximizing it would overestimate the survival duration, because this sampling scheme tends to include more longsurviving patients than shortsurviving patients among those who had their operations before January 1, 1980. The survival durations of the patients who had their operations before the first day of observation (in this example January 1, 1980) and were still living on that date are said to be left censored. In order to obtain consistent estimates of the parameters of this model, we must either maximize the correct likelihood function or eliminate from the sample all the patients living on January 1, 1980. For the correct likelihood function of the second sampling scheme with left censoring, see Amemiya (1991).
We have deliberately chosen the heart transplant example to illustrate two sampling schemes. With data such as unemployment spells, the first sampling scheme is practically impossible because the history of unemployment goes back very far.
We mentioned earlier a problem of computing the integral in (13.7.16) or (13.7.17), which arises when we specify the hazard function generally as (13.7.12). The problem does not arise if we assume
(13.7.19) Xi(t) = XoWexp(x’p).
The duration model with the hazard function that can be written as a product of the term that depends only on t and the term that depends only on i, as above, is called the proportional hazard model. Note that Lancaster’s model (13.7.9) is a special case of such a model. Cox (1972) showed that in the proportional hazard model (3 can be estimated without specifying the baseline hazard X0(t). This estimator of P is called the partial maximum likelihood estimator. The baseline hazard X.0M can be nonparamet – rically estimated by the KaplanMeier estimator (1958). For an econometric application of these estimators, see Lehrer (1988).
The general model with the hazard function (13.7.12) may be estimated by a discrete approximation. In this case X,(i) must be interpreted as the probability that the spell of the ith person ends in the interval (t, t + 1). The contribution to the likelihood function of the spell that ends after k periods is 11^7/ [1 — Xt(t’)]Xt(k), whereas the contribution to the likelihood function of the spell that lasts at least for k periods is П *=1[ 1 — X,(t)]. See Moffitt (1985) for the maximum likelihood estimator of a duration model using a discrete approximation.
Next we demonstrate how the exponential duration model can be derived from utility maximization in a simple jobsearch model. We do so first in the case of discrete time, and second in the case of continuous time.
Consider a particular unemployed worker. In every period there is a probability X. that a wage offer will arrive, and if it does arrive, its size is distributed i. i.d. as G. If the worker accepts the offer, he will receive the same wage forever. If he rejects it, he incurs the search cost c until he is employed. The discount rate is 5. Let V (t) be the maximum utility at time t. Then the Bellman equation is
(13.7.20) V(t) = max[8_1W(t), (1 — b)EV(t + 1) — c]
+ (1 – X)[(l – 8)EV(t + 1) – с].
Taking the expectation of both sides and setting EV(t) = V because of stationarity,
(13.7.21) V = 8_1XJ?[max(W, R)] + 8_1(1 – X)R,
where f? = 8 [ (1 — 8)У — c] and W(t) has been written simply as W because of our i. i.d. assumption. Note that
(13.7.22) E[max(W, R)] = [“ wdG(w) + RG{R).
JR
Note further that У appears in both sides of (13.7.21). Solve for V, call the solution V*, and define R* = 8[(1 — 8)V* — c], the reservation wage. The worker should accept the wage offer if and only if W > R*. Define P = P(w > R*). Then the likelihood function of the worker who accepted the wage in the (t + l)st period is
(13.7.23) L = (1 – ХРУкР.
Many extensions of this basic model have been estimated in econometric applications, of which we mention only two. The model of Wolpin (1987) introduces the following extensions: first, the planning horizon is finite; second, the wage is observed with an error. A new feature in the model of Pakes (1986), in which W is the net return from the renewal of a patent, is that W(t) is serially correlated. This feature makes solution of the Bellman equation considerably more cumbersome.
The next model we consider is the continuous time version of the previous model. A fuller discussion of the model can be found, for example, in Lippman and McCall (1976). The duration T until the wage offer arrives is distributed exponentially with the rate X: that is, P(T > t) = exp(~Xt). When it arrives, the wage is distributed i. i.d. as G. We define c and 8 as before. The Bellman equation is given by
(13.7.24) V(t) = max[8_1W(0, K, where
f°° [st
(13.7.25) К = J {exp[ — 8(5— i)]£^(i) — cj exp(—8т)^т)ехр[ — (s—t)]ds.
Taking the expectation of both sides and putting EV(t) = V because of stationarity, we have
(13.7.26) V = 81£[max(W, R)],
where R = bK. Solve (13.7.26) for V, call the solution V*, and define R* accordingly. It is easy to show that R* satisfies
(13.7.27) R* = – c + 81 P° (w – R*)dG(w).
J R*
Let / (t) be the density function of the unemployment duration. Then we have
(13.7.28) f(t) = P exp(—XPt),
where P = P(W > R*). Thus we have obtained the exponential model. For a small value of XP, (13.7.28) is approximately equal to (13.7.23).

Leave a reply