# Multistate Models with Exogenous Variables

Theoretically, not much need be said about this model beyond what we have discussed in Section 11.1.1 for the general case and in Section 11.1.3 for the two-state case. The likelihood function can be derived from (11.1.4) by speci­fying Pjk(t) as a function of exogenous variables and parameters. The equiva­lence of the NLWLS to the method of scoring iteration was discussed for the
general case in Section 11.1.1, and the minimum chi-square estimator defined for the two-state case in Section 11.1.3 can be straightforwardly generalized to the multistate case. Therefore it should be sufficient to discuss an empirical article by Toikka (1976) as an illustration of the NLWLS (which in his case is a linear WLS estimator because of his linear probability specification).

Toikka’s model is a three-state Markov model of labor market decisions in which the three states (corresponding to j = 1, 2, and 3) are the state of being employed, the state of being in the labor force (actively looking for a job) and unemployed, and the state of being out of the labor force.

The exogenous variables used by Toikka consist of average (over individ­uals) income, average wage rate, and seasonal dummies, all of which depend on time (months) but not on individuals. Thus Toikka’s model is a homoge­neous and nonstationary Markov model. Moreover, Toikka assumed that transition probabilities depend linearly on the exogenous variables.2 Thus, in his model, Eq. (11.1.7) can be written as у'(O’ = [y’O – І)’ ® х,’]П + її'(O’,

which is a multivariate heteroscedastic linear regression equation. As we indicated in Section 11.1.1, the generalized least squares estimator of П is asymptotically efficient. _

Let Y, be the N X M matrix the ith row of which is у'(O’ and let Y, be the NX(M— 1) matrix consisting of the first M — 1 columns of Y,. Define U( similarly. Then we can write (11.1.67) as     (11.1.68)

To define the FGLS estimator of П, write (11.1.68) as Y = ХП + U and write the columns of Y, П, and U explicitly as Y = [ y,, y2,. . . , yc], П = [я1,л2,. . . ,nG], and U = [u1,u2,. . . , uG], where G = M— 1. Also define у = (уі, У2, • . . , Ус)’, я=(я;,^,. . . , тіс)’, and u = (ui, u2,. . . , Ug)’. Then (11.1.68) can be written as

у = Хя + u. (11.1.70)

The FGLS estimator of я is (Х’£2_1Х)_1Х’Д_Іу, where £2 is a consistent estimator of £2 = isuu’. Here, £2 has the following form:

Du D12 • • D1C

D2j D22 D2C

(11.1.71)

where each is a diagonal matrix of size NT. If each D,* were a constant times the identity matrix, (11.1.70) would be Zellner’s seemingly unrelated regression model (see Section 6.4), and therefore the LS estimator would be asymptotically efficient. In fact, however, the diagonal elements of D,*. are not constant.    Toikka’s estimator of П, denoted П, is defined by

Because (Y^iY,., )-1YJ_,Y, is the first M— 1 columns of the unconstrained MLE of the Markov matrix P(/), Toikka’s estimator can be interpreted as the LS estimator in the regression of P(t) on xt. Although this idea may seem intuitively appealing, Toikka’s estimator is asymptotically neither more nor less efficient than П. Alternatively, Toikka’s estimator can be interpreted as premultiplying (11.1.68) by the block-diagonal matrix

"(YoY0)-1Yo

(YiYJ-Y’,

L (Yr-, Yr_,)~1 Y^_, J

and then applying least squares. If, instead, generalized least squares were applied in the last stage, the resulting estimator of П would be identical with the GLS estimator of П derived from (11.1.68).

11.1.2 Estimation Using Aggregate Data

Up to now we have assumed that a complete history of each individual, y){t) for every i, t, and j, is observed. In this subsection we shall assume that only the
aggregate data n/t) = 2jL, yj(f) are available. We shall first discuss LS and GLS estimators and then we shall discuss MLE briefly.

Suppose the Markov matrix P‘(t) is constant across i, so that P'(t) = P(r). Summing both sides of (11.1.7) over і yields X y'(‘)=w X y‘(t -1)+x 5’W- (11.1.74)

where Ц; = P(r)’y'(t — 1). Depending on whether P(r) depends on unknown parameters linearly or nonlinearly, (11.1.73) defines a multivariate heterosce – dastic linear or nonlinear regression model. The parameters can be estimated by either LS or GLS (NLLS or NLGLS in the nonlinear case), and it should be straightforward to prove their consistency and asymptotic normality as NT

goes tO °°.

The simplest case occurs when P'(t) is constant across both і and t, so that P'(0 = P. If, moreover, P is unconstrained with M(M— 1) free parameters, (11.1.73) becomes a multivariate linear regression model. We can apply either a LS or a GLS method, the latter being asymptotically more efficient, as was explained in Section 11.1.4. See the article by Telser (1963) for an application of the LS estimator to an analysis of the market shares of three major cigarette brands in the period 1925-1943.

If P(t) varies with t, the ensuing model is generally nonlinear in parameters, except in Toikka’s model. As we can see from (11.1.68), the estimation on the basis of aggregate data is possible by LS or GLS in Toikka’s model, using the equation obtained by summing the rows of each Y,. A discussion of models where the elements of P(0 are nonlinear functions of exogenous variables and parameters can be found in an article by MacRae (1977). MacRae also dis­cussed maximum likelihood estimation and the estimation based on incom­plete sample.

In the remainder of this subsection, we shall again consider a two-state Markov model (see Section 11.1.3). We shall present some of the results we have mentioned so far in more precise terms and shall derive the likelihood function.

Using the same notation as that given in Section 11.1.3, we define r,= 1^.хУи. The conditional mean and variance of r, given rf_, are given by (11.1.75)

 and

vrt=x w – f*)=лчі – p)r,-i+m – pm-n-i).

i-l

(11.1.76)

The NLWLS estimation of у is defined as that which minimizes І=І Vr,  where Vr, is obtained by estimating Pand P? by рфх, + d’x,) and F(f}’x,), respectively, where a and ji are consistent estimates obtained, for example, by minimizing SfL x(rt — S|i1ir,,)2. Alternatively we can minimize   which will asymptotically give the same estimator.3 Let у be the estimator obtained by minimizing either (11.1.77) or (11.1.78). Then we have

The asymptotic variance-covariance matrix of у can be shown to be larger (in matrix sense) than that of у given in (11.1.38) as follows: The inverse of the latter can be also written as    (11.1.80) Put zi( = [F,,(l – F^-^dFJdy and a„ = [F*(l – Fu)]1’2. Then the desired inequality follows from

Finally, we can define the MLE, which maximizes the joint probability of r, and r,_[, t = 1, 2,. . . , T. The joint probability can be shown to be

The maximization of (11.1.82) is probably too difficult to make this estimator of any practical value. Thus we must use the NLLS or NLWLS estimator if only aggregate observations are available, even though the MLE is asymptoti­cally more efficient, as we can conclude from the study of Barankin and Gurland (1951).