# One-Factor Models

The individual likelihood function (9.7.12) involves a Г-tuple normal inte­gral, and therefore its estimation is computationally infeasible for large Г (say, greater than 5). For this reason we shah consider in this subsection one-factor models that lead to a simplification of the likelihood function.

We assume

»й = а»и* + ей, (9.7.13)

where (a,), t = 1, 2,. . . , Г, are unknown parameters, u, and {ей} are nor­mally distributed independent of each other, and (eft) are serially indepen­dent. We suppress subscript і as before and express (9.7.13) in obvious vector notation as

V = atM + 6, (9.7.14)

where v, a, and e are Г-vectors and и is a scalar. Then the joint probability of у can be written as

P(y) = EuF[y/ * (2y -1); D * (2y – l)(2y -1)’], (9.7.15)

where у/ now includes au and D = Eee’. Because D is a diagonal matrix, Г in

(9.7.15) can be factored as the product of Tnormal distribution functions. The estimation of this model, therefore, is no more difficult than the estimation of model (9.7.4).

For the case T = 3, model (9.7.14) contains a stationary first-order autore­gressive model (see Section 5.2.1) as a special case. To see this, put a = (1 — (p, 1 ,pY, Vu = a2, and take the diagonal elements of D as a2,0,

and a2. Thus, if T = 3, the hypothesis of AR( 1) can easily be tested within the more general model (9.7.14). Heckman (1981c) accepted the AR(1) hypoth­esis using the same data for female labor participation as used by Heckman and Willis (1977). If T > 4, model (9.7.13) can be stationary if and only if a, is constant for all t. A verification of this is left as an exercise.

Consider a further simplification of (9.7.13) obtained by assuming a, = 1, {Щ) are i. i.d. over i, and {€„} are i. i.d. both over i and t. This model differs from model (9.7.2) only in the presence ofy,,_, among the right-hand variables and is analogous to the Balestra-Nerlove model (Section 6.6.3) in the continuous variable case.

As in the Balestra-Nerlove model, {«, } may be regarded as unknown param­eters to estimate. If both N and T go to <», fi, y, and {и,} can be consistently estimated. An interesting question is whether we can estimate 0 and у consist­ently when only Ngoes to °°. Unlike the Balestra-Nerlove model, the answer to this question is generally negative for the model considered in this subsection. In a probit model, for example, the values of /?and у that maximize

я T

L = П П Чщ + «/)’*[! – ФiVu + w,)]1-*’, (9.7.16)

1-1 f—1

while treating {u,} as unknown constants, are not consistent. Heckman (1981b), in a small-scale Monte Carlo study for a probit model with n = 100 and T= 8, compared this estimator (with ую treated as given constants), called the transformation estimator, to the random-effect probit MLE, in which we maximize the expectation of (9.7.16) taken with respect to {u,}, regarded as random variables, and specify the probability of ую under the specification 2 given at the end of Section 9.7.2.

Heckman concluded that (1) if у = 0, the transformation estimator per­formed fairly well relative to the random-effect probit MLE; (2) if у Ф 0, the random-effect probit MLE was better than the transformation estimator, the latter exhibiting a downward bias in у as in the Balestra-Nerlove model (see Section 6.6.3).

Exercises

1. (Section 9.2.1)

In a Monte Carlo study, Goldfeld and Quandt (1972, Chapter 4) gener­ated {yt} according to the model P(y{ = 1) = Ф(0.2 + 0.5xu + 2хгі) and, using the generated {y,} and given {x1(, x2i), estimated the yS’s in the linear probabilityjnodel P(y, = 1) = fi0 + ffxu + fiix2i – Their estimates were fi0 = 0.58,/?! = 0.1742, and )32 = 0.7451. How do you convert these esti­mates into the estimates of the coefficients in the probit model?

2. (Section 9.2.2)

Consider a logit model P(yt =1) = Л(Д, + fiiXt), where x, is a binary variable taking values 0 and 1. This model can be also written as a linear probability model P(y, = 1) = y0 + Уі*,-

a. Determine y0 and y, as functions of fi0 and /?,.

b. Show that the MLE of y„ and y, are equal to the least squares estimates in the regression of on x, with an intercept.

3. (Section 9.2.3)

Show that global concavity is not invariant to a one-to-one transforma­tion of the parameter space.

4. (Section 9.2.8)

In the model of Exercise 2, we are given the following data:

лгі 1 100000101 yOOlOOl 1010 1.

Calculate the MLE and the DA estimates (with 20 = ) of /?0 and.

5. (Section 9.2.8)

The following data come from a hypothetical durable goods purchase study:

 Case(f) Constant x, n, r, flt ф-‘Є фЛп 1 1 5 25 12 0.4800 ‘*t" -0.0500 0.3984 2 1 7 26 16 0.6154 0.2930 0.3822 3 1 10 31 22 0.7097 0.5521 0.3426 4 1 15 27 21 0.7778 0.7645 0.2979

a. Compute the coefficient estimates Д, and Д using the following models and estimators:

(1) Linear Probability Model—Least Squares

(2) Linear Probability Model—Weighted Least Squares

(3) Logit Minimum x2

(4) Probit Minimum x2

(5) Discriminant Analysis Estimator’

b. For all estimators except (5), find the asymptotic variance-covar­iance matrix (evaluated at the respective coefficient estimates).

c. For all estimators, obtain estimates of the probabilities P, corre­sponding to each case t.

d. Rank the estimators according to each of the following criteria: 2 (A-A)2- n,

Aa-A)

log L = 21n log A + (И, – Г,) log (1 – A)]- /

6. (Section 9.2.9)

It may be argued that in (9.2.62) the asymptotic variance of r should be the unconditional variance of r, which is n-227_,.F/l — F,) + F(rt_12JL1.F(), where V is taken with respect to random variables {xf). What is the fallacy of this argument?

7. (Section 9.3.3)

In the multinomial logit model (9.3.34), assume j = 0,1, and 2. For this model define the NLGLS iteration.

8. (Section 9.3.5)

Suppose {у,}, / = 1, 2,. . . , n, are independent random variables taking three values, 0,1, and 2, according to the probability distribution defined by (9.3.51) and (9.3.52), where we assume //<, = 0, //j = and ц2 = jfi2 ■ Indicate how we can consistently estimate fii, fi2, and p using only a binary logit program.

9. (Section 9.3.5)

Write down (9.3.59) and (9.3.60) in the special case where S=2,Bx =

(1,2) , and B2 — (3, 4) and show for which values of the parameters the model is reduced to a four-response independent logit model.

10. (Section 9.3.5)

You are to analyze the decision of high school graduates as to whether or not they go to college and, if they go to college, which college they go to. For simplicity assume that each student considers only two possible col­leges to go to. Suppose that for each person /,/=1,2,. . . , n, we observe z, (family income and levels of parents’ education) and x0 (the quality index and the cost of the ;’th school),;’ = 1 and 2. Also suppose that we observe for every person in the sample whether or not he or she went to college but observe a college choice only for those who went to college. Under these assumptions, define your own model and show how to esti­mate the parameters of the model (cf. Radner and Miller, 1970).

11. (Section 9.3.6)

Write down (9.3.64), (9.3.65), and (9.3.66) in the special case ofFigure 9.1 (three-level), that is, C, = (1, 2), C2 = (3, 4), Bx = (1, 2), B2 = (3, 4), B3 = (5, 6), and В4 = (7, 8).

12. (Section 9.3.10)

Suppose that уt takes values 0,1, and 2 with the probability distribution Р(Уі = 0) = Л(х-/10) and Р(Уі = 1 yt Ф 0) = A(x{j? i). Assuming that we have n, independent observations on y, with the same value x, of the independent variables, /=1,2,. . . , T, indicate how to calculate the MIN x2 estimates of Д, and.

13. (Section 9.4.3)

Consider two jointly distributed discrete random variables у and x such that у takes two values, 0 and 1, and x takes three values, 0,1, and 2. The most general model (called the saturated model) is the model in which there is no constraint among the five probabilities that characterize the joint distribution. Consider a specific model (called the null hypothesis model) in which

P(y = 1 |x) = [ 1+ exp (~a – 0x)]~’

and the marginal distribution of x is unconstrained. Given n independent observations on (y, x), show how to test the null hypothesis against the saturated model. Write down explicitly the test statistic and the critical region you propose.

14. (Section 9.4.3)

Supposethejointdistributionofyuandy2f, f= 1, 2,. . . , T, is given by the following table:

У2і

19. (Section 9.5.3)

In the same model described in Exercise 17, show that the asymptotic variance of the MME of fi0 is the same as that of the WMLE.

20. (Section 9.5.3)

Show that (9.5.36) follows from (9.5.19).

21. (Section 9.6.3)

We are given the following data:

і 1 2 3 4 5

уt 10 0 11

Хц — 1 — 1 0 0 1

х2і 0 1-110

a. Obtain the set of P values that maximize

S(p) = 2 [УіХ(Рхи + *2, — 0) + (1 — УіМРХи + *2, < 0)],

where *(E) = 1 if event E occurs, or is zero otherwise.

b. Obtain the set of P values that maximize

V(P, F) = 2 ІУі log F(pxu + x2i)

i-1

+ (1 — >"<) log [1 – F(pxu + x2l)]},

where F is also chosen to maximize у/ among all the possible distribution functions. (Note that I have adopted my own normalization, which may differ from Manski’s or Cosslett’s, so that the parameter space of P is the whole real line.)

22. (Section 9.6.3)

Cosslett (1983, p. 780) considered two sequences of yt ordered according to the magnitude of x,’/?:

Sequence A: 1 0 1 0 1 1 1 1

Sequence В: 0 1 1 1 1 1 1 0

He showed that Sequence A yields a higher value of log L whereas Se­quence В yields a higher score. Construct two sequences with nine obser­vations each and an equal number of ones such that one sequence yields a higher value of log L and the other sequence yields a higher score. Your sequence should not contain any of Cosslett’s sequences as a subset.

23. (Section 9.7.2)

Assume that /? = 0 and {vit} are i. i.d. across і in model (9.7.11) and derive the stationary probability distribution of {yit} for a particular i.

24. (Section 9.7.3)

Show that if T> 4, model (9.7.13) can be stationary if and only if a, is constant for all t. Show that if T = 4 and a, > 0, then model (9.7.13) can be stationary if and only if a, is constant for all t. 