# Heckman’s Two-Step Estimator

Heckman (1976a) proposed a two-step estimator in a two-equation general­ization of the Tobit model, which we shall call the Type 3 Tobit model. But his estimator can also be used in the standard Tobit model, as well as in more complex Tobit models, with only a minor adjustment. We shall discuss the estimator in the context of the standard Tobit model because all the basic features of the method can be revealed in this model. However, we should keep in mind that since the method requires the computation of the probit MLE, which itself requires an iterative method, the computational advantage of the method over the Tobit MLE (which is more efficient) is not as great in the standard Tobit model as it is in more complex Tobit models.

To explain this estimator, it is useful to rewrite (10.4.6) as

y, = x’,fi + аЦх’іО) + €j, for і such that yt > 0, (10.4.11)

where we have written a = fit a as before and e(- = у, — Е(у(ух > 0) so that Eet = 0. The variance of e, is given by

Ke, = a2 — a2x’od(x’a) — er2A(xJa)2. (10.4.12)

Thus (10.4.11) is a heteroscedastic nonlinear regression model with и, obser­vations. The estimation method Heckman proposed consists of two steps: Step 1. Estimate a by the probit MLE (denoted a) defined earlier.

Step 2. Regress у і on x, and A(xja) by least squares, using only the positive observations on yt.

To facilitate further the discussion of Heckman’s estimator, we can rewrite

(10.4.11) again as

yt = x’lfi + a A(xja) + e, + r]„ (10.4.13)

for / such that yt> 0,

where ri — o[A(xJa) — A(x{d)]. We can write (10.4.13) in vector notation as у = Xfi+aX + e + ri, (10.4.14)

A

where the vectors y, A, e, and ij have щ elements and matrix X has nx rows, corresponding to the positive observations of yt. We can further rewrite (10.4.14) as

y = Z y + e + lj, (10.4.15)

where we have defined %. — (X, A) and у = d)’. Then Heckman’s two-step

estimator of у is defined as

jMfc’Zr’Z’y. (10.4.16)

The consistency of у follows easily from (10.4.15) and (10.4.16). We shall derive its asymptotic distribution for the sake of completeness, although the result is a special case of Heckman’s result (Heckman, 1979). From (10.4.15) and (10.4.16) we have

V«7(y – y) = («r’Z’Zr’^r’^Z’e + nTi/2Z’ri). (10.4.17)

Because the probit MLE a is consistent, we have

plim «7’Z’Z = lim «71 Z’Z, (10.4.18)

where Z = (X, A). Under the assumptions stated after (10.2.4), it can be shown that

V^Z’e N(0, a2 lim njlZ%Z), (10.4.19)

where cr22 — Eee’ is the л, X л, diagonal matrix the diagonal elements of which are Fie, given in (10.4.12). We have by Taylor expansion of A(x<a) around A(xja)

I? = -<7 — (« – a) + 0(n~l). (10.4.20)

Using (10.4.20) and (10.4.2), we can prove
where D, was defined after (10.4.2). Next, note that e and ij are uncorrelated because t] is asymptotically a linear function of w on account of (10.4.2) and

(10.4.20) and e and w are uncorrelated. Therefore, from (10.4.17), (10.4.18),

(10.4.19) , and (10.4.21), we finally conclude that у is asymptotically normal with mean у and asymptotic variance-covariance matrix given by

Vy = аЪ’ЪУхЪ’% + (I – X)X(X’DiX)~iX'(i –

(10.4.22)

Expression (10.4.22) may be consistently estimated either by replac­ing the unknown parameters by their consistent estimates or by (Z’Z)" 1Z’AZ(Z’Z)-^, where A is the diagonal matrix the ith diagonal element of which is [Уі — x’ifi — SAfx’a)]2, following the idea of White (1980).

Note that the second matrix within the square bracket in (10.4.22) arises because A had to be estimated. If A were known, we could apply least squares directly to (10.4.11) and the exact variance-covariance matrix would be ff^Z’Zr’Z’XZtZ’Z)-1.

Heckman’s two-step estimator uses the conditional mean of yt given in

(10.4.6) . A similar procedure can also be applied to the unconditional mean of yt given by (10.4.9).4 That is to say, we can regress all the observations of yh including zeros, on Фх, and ф after replacing the a that appears in the argu­ment of Ф and ф by the probit MLE <5. In the same way as we derived (10.4.11) and (10.4.13) from (10.4.6), we can derive the following two equations from

(10.4.9) :

yt = Ф(х-а)[Х(/ї + crA(x’a)] + <5, (10.4.23)

and

у, = Ф(х;а)[х0 + аА(х’а)] + S,+ (10.4.24)

where S, = у,- — Eyt and = [Ф(х’а) — Ф(х’а)]х(‘/? + а[ф(х’а) — ф(х’а)]. А vector equation comparable to (10.4.15) is

У = ШУ + й+& (10.4.25)

where D is the n X n diagonal matrix the ith element of which is Ф(х’а). Note that the vectors and matrices appear with underbars because they consist of n elements or rows. The two-step estimator of у based on all the observations, denoted y, is defined as у = (Z, D2Z)-1Z’Dy.

The estimator can easily be shown to be consistent. Tо derive its asymptotic distribution, we obtain from (10.4.25) and (10.4.26)

M9-У) = («“" ‘Z’D2Z) n~ i/2Z’f)S + /Г 1/2Z’D£). (10.4.27)

Here, unlike the previous case, an interesting fact emerges: By expanding Ф(х-а) and ф(х’а) in Taylor series around x-a we can show £,-= 0(n~’). Therefore

plim п~1/21’йі = 0. (10.4.28)

Corresponding to (10.4.18), we have

plim «“‘Z’M = lim n ‘Z’&Z, (10.4.29)

where D is obtained from D by replacing a with a. Corresponding to (10.4.19), we have

n-wz/Dd-* N(0, a2 lim и"1 Z’D2£2Z), (10.4.30)

where a2C2 ■ ESS’ is the и X и diagonal matrix the ith element of which is а2Ф(Х/аХ(х-а)2 + x’aA(x’a) + 1 — Ф(х-а)[х, а + А(х(‘а)]2}. Therefore, from (10.4.27) through (10.4.30), we conclude that у is asymptotically normal with mean у and asymptotic variance-covariance matrix given by5

Vy = ct2(Z, D2Z)"1Z’D2QZ(Z, D2Z)-1. (10.4.31)

Which of the two estimators у and у is preferred? Unfortunately, the differ­ence of the two matrices given by (10.4.22) and (10.4.31) is generally neither positive definite nor negative definite. Thus an answer to the preceding ques­tion depends on parameter values.

Both (10.4.15) and (10.4.25) represent heteroscedastic regression models. Therefore we can obtain asymptotically more efficient estimators by using weighted least squares (WLS) in the second step of the procedure for obtaining у and y. In doing so, we must use a consistent estimate of the asymptotic variance-covariance matrix of є +1] for the case of (10.4.15) and of S + f for the case of (10.4.25). Because these matrices depend on y, an initial consistent estimate of у (say, у or у) is needed to obtain the WLS estimators. We call these WLS estimators yw and yw, respectively. It can be shown that they are consist­ent and asymptotically normal with asymptotic variance-covariance matrices given by Uyw = o2{ Z'[X + (I – 2)X(X’DjX)-1X'(I – X)]-‘Zr1

and

Fyw = ff2(Z’D2Q_,Z)“1. (10.4.33)

Again, we cannot make a definite comparison between the two matrices.