# Markov Chain Models

11.1.1 Basic Theory

Define a sequence of binary random variables

yjit) = 1 if /th person is in state j at time t (11.1. l)

= 0 otherwise,

/=1,2,. . . ,N, /=1,2,. . . , Г,

j — 1, 2,—- M.

Markov chain models specify the probability distribution of yj(Z) as a func – tionofy^(j), k= 1, 2,. . . , Л/ands = t — l, t — 2,. . . , as well as (possi­bly) of exogenous variables.

Markov chain models can be regarded as generalizations of qualitative response models. As noted in Section 9.7, Markov chain models reduce to QR models if y‘j(t) are independent over t. In fact, we have already discussed one type of Markov models in Section 9.7.2.

Models where the distribution of y](t) depends on y’k(t — 1) but not on y’k(t — 2), yk{t — 3),. . .are called first-order Markov models. We shall pri­marily discuss such models, although higher-order Markov models will also be discussed briefly.

First-order Markov models are completely characterized if we specify the transition probabilities defined by

Pjk(t) = Prob [/th person is in state к at time t given that (11.1.2) he was in state j at time Z — 1]

and the distribution of yj(0), the initial conditions.

The following symbols will be needed for our discussion:

y‘(t) = M-ve ctor the j th element of which is yj(t) (11.1.3)

»/(*) = 2 УР>

i-i

njk(t) – 2 у‘А* ~ OHW

i-l

t-i i-i

P'(t) = {Pjk(t)), an M X M matrix

p){t) = Prob[/th person is in state j at time /]

p‘(t) = M – vector the yth element of which is p)(t).

The matrix P‘(t) is called a Markov matrix. It has the following properties: (1) Every element of P'(/) is nonnegative. (2) The sum of each row is unity. (In other words, if 1 is an Af-vector of ones, then P'(/)l = 1.)

If yj(0) is a binary random variable taking the value of 1 with probability pj(0), the likelihood function of the first-order Markov model can be written as

where t ranges from 1 to T, і ranges from 1 to N, and к and j range from 1 to M unless otherwise noted. If the initial values yj(0) are assumed to be known constants, the last product term of the right-hand side of (11.1.4) should be dropped.

Clearly, all the parameters Pjk{t) and pj(0) cannot be estimated consist­ently. Therefore they are generally specified as functions of a parameter vector 0, where the number of elements in в is either fixed or goes to » at a sufficiently slow rate (compared to the sample size). Later we shall discuss various ways to parameterize the transition probabilities Pjk(t).

A Markov model where P’Jk(t) = Pjk for all t is called the stationary Markov model. If Pjk{t) = Pjk{t) for all i, the model is called homogeneous. (Its anto­nym is heterogeneous.) A parameterization similar to the one used in QR models is to specify Pjk(t) = FJk[Xtyfi] for some functions FJk such that ^LiFjk~ I – Examples will be given in Sections 11.1.3 and 11.1.4. The case where у jit) are independent over t (the case of pure QR models) is a special case of the first-order Markov model obtained by setting Pjk(t) = Pj’kiO for all j and / for each і and t.

For QR models we defined the nonlinear regression models: (9.2.26) for the binary case and (9.3.16) for the multinomial case. A similar representation for the first-order Markov model can be written as

Ду’ам* – 1), УKt – 2),. . .] = PWyKt – 1) (11.1.5)

or

y'(/) = P‘(tyy‘(t – 1) + U‘(/). (11.1.6)

Because these M equations are linearly dependent (their sum is 1), we elimi­nate the Л/th equation and write the remaining M — 1 equations as

у *(t) = P W(t – 1) + ii’U). (11-1.7)

Conditional on y‘(t — 1), У‘іс — 2),. . . , we have Eu^t) = 0 and Vu‘(t) = D(/z) — щі’, where fi = P ‘(t)’yl(t — 1) and D(fi) is the diagonal matrix with the elements of ft in the diagonal. Strictly speaking, the analog of (9.3.16) is

(11.1.7) rather than (11.1.6) because in (9.3.16) a redundant equation has been eliminated.

_ As in QR models, the NLGLS estimator of the parameters that characterize P'(/), derived from (11.1.7), yields asymptotically efficient estimates. The presence of y‘(t — 1) in the right-hand side of the equation does not cause any problem asymptotically. This is analogous to the fact that the properties of the least squares estimator in the classical regression model (Model 1) hold asymptotically for the autoregressive models discussed in Chapter 5. We shall discuss the NLWLS estimator in greater detail in Section 11.1.3. There we shall consider a two-state Markov model in which P'(t) depends on exogenous variables in a specific way.

As in QR models, minimum chi-square estimation is possible for Markov models in certain cases. We shall discuss these cases in Section 11.1.3.

Taking the expectation of both sides of (11.1.5) yields

р'(0 = Рдаґ*-1). (П.1.8)

It is instructive to rewrite the likelihood function (11.1.4) as a function of P‘Jk(t) and p'(t) as follows. Because

-РШ)*», (И.1.9)

we can write (11.1.4) alternatively as

If we specify P’jkU) and pj(0) to be functions of a parameter vector в, then by

(11.1.8) L2 is also a function of в. The partial likelihood function L2(6) has the same form as the likelihood function of a QR model, and maximizing it will yield consistent but generally asymptotically inefficient estimates of в.

As we noted earlier, if yj(t) are independent over t, the rows of the matrix P'(t) are identical. Then, using (11.1.8), we readily see Pjk(t) = pk(t). There­fore L = L2, implying that the likelihood function of a Markov model is reduced to the likelihood function of a QR model.

Because p’j(t) is generally a complicated function of the transition probabili­ties, maximizing Lj cannot be recommended as a practical estimation method. However, there is an exception (aside from the independence case mentioned earlier), that is, the case when pj(0) are equilibrium probabilities.

The notion of equilibrium probability is an important concept for station­ary Markov models. Consider a typical individual and therefore drop the

superscript і from Eq. (11.1.8). Under the stationarity assumption we have p(t) = P’p(t-l). (ll. l.ll)

By repeated substitution we obtain from (11.1.11)

p(0 = (P’)’P(O). (11.1.12)

if lim,_„ (Р’У exists, then

p(00) = (P’)»p(0). (11.1.13)

We call the elements of p(°°) equilibrium probabilities. They exist if every element of P is positive.

It is easy to prove (Bellman, 1970, p. 269) that if every element of P is positive, the largest (in absolute value or modulus) characteristic root of P is unity and is unique. Therefore, by Theorem 1 of Appendix 1, there exist a matrix H and a Jordan canonical form D such that P’ = HDH-1. Therefore we obtain

(P’)“ = HD“H-1 = HJH-1, (11.1.14)

where J is the M X M matrix consisting of 1 in the northwestern corner and 0 elsewhere. Equilibrium probabilities, if they exist, must satisfy

p(°°) = P’p(°°), (11.1.15)

which implies that the first column of H is p{°°) and hence the first row of H_1 is the transpose of the M – vector of unity denoted 1. Therefore, from (11.1.14),

(PT = P(°°)1′. (11.1.16)

Inserting (11.1.16) into (11.1.13) yields the identity p(°°) = p(°°)l’p(0) = p(°°) for any value of p(0). If p(«>) exists, it can be determined by solving (11.1.15) subject to the constraint Гр(°°) = 1. Because the rank of I — P’ is M — 1 under the assumption, the p(°°) thus determined is unique.

If pj(0) = pj(°°), L2 reduces to

^=ППРІ(°°)їГ-0^ (11.1.17)

і J

which is the likelihood function of a standard multinomial QR model. Even if pj(0) Ф pj(°°), maximizing Lf yields a consistent estimate as Tgoes to infinity. We shall show this in a simple case in Section 11.1.3.

Now consider the simplest case of homogeneous and stationary Markov models characterized by

P)k(t) = Pjk for all і and t. (11.1.18)

The likelihood function (11.1.4) conditional on yj(0) is reduced to

І = ПП V’ (11.1.19)

к j

It is to be maximized subject to Af constraints j= ^ 2,. ,M.

Consider the Lagrangian

l). (11.1.20)

Setting the derivative of 5 with respect to PJk equal to 0 yields (11.1.21)

Summing both sides of (11.1.21) over к and using the constraints, we obtain the MLE

See Anderson and Goodman (1957) for the asymptotic properties of the MLE

(11.1.22).

Anderson and Goodman also discussed the test of various hypotheses in the homogeneous stationary Markov model. Suppose we want to test the null hypothesis that PJk is equal to a certain (nonzero) specified value Pfk for

k= 1,2……….. M and for a particular/ Then, using a derivation similar to

(9.3.24) , we can show

where Pjk is the MLE. Furthermore, if Pfk is given for/ = 1, 2,. . . , Mas well as k, we can use the test statistic 2 jL, Sj, which is asymptotically distributed as chi-square with M(M — 1) degrees of freedom. Next, suppose we want to test (11.1.18) itself against a homogeneous but nonstationary model characterized by P‘Jk(t) = Pjk(t). This can be tested by the likelihood ratio test statistic with the following distribution:

2 log П П П tAt/PjkU)]"^ ~ (11 • 1.24)

I j к

where PJk(t) = njk(t)/J. jf. inJk(t).

In the same article Anderson and Goodman also discussed a test of the first-order assumption against a second-order Markov chain.