# Implementing the Gibbs Sampling Algorithm

Bayesian analysis of this model entails deriving the joint posterior distribution of the model’s parameters and unobserved variables. Recall that the value function differences Z = {_(Zijt)j=1f2f3-i=1fN-t=1f4o} are never observed, and that wages W = {(wijt)j=1f2,3-,i=1,N-,t=1,4o} are only partially observed. Let W1 and W2 denote the set of observed and unobserved wages, respectively, and let Y = {Yijt}i=1fN. j=1At=1f40 denote the log-wage equation regressors. Then the joint posterior density is p(W^ Z, p1, P2, а, п, X-1, Xn1 |W1, Y, Л). By Bayes’ law, this density is proportional to

p(W, Z| Y, Л, ¥, P1, p2, а, п, X-1, Xn1) ■ p(p1, в,, а, п, X-1, Xn1). (22.14)

The first term in (22.14) is the so-called "complete data" likelihood function. It is the likelihood function that could be formed in the hypothetical case that we had data on N individuals observed over 40 periods each, and we observed all of the value function differences Z and the complete set of wages W for all alternatives. This is:

p(W, Z|Y, Л, ¥, pi, p2, a, n, Z-1, ^n1) –  ln wnt – Y’itPi"

 | Zn111/2 exp * Vln wi2t – ,

■ %(Zijt > 0, Zikt(k Ф j) < 0 if du = j and j Є {1, 2, 3}, {Z^!^ < 0 otherwise)

(22.15)

The second term in (22.15) is the joint prior distribution. We assume flat priors on all parameters except the two precision matrices, for which we assume the standard noninformative priors (see Zellner, 1971, section 8.1):

p(Ze-1) – | Z-11-3/2, p(Zn-1) – | Z-11-2 (22.16)

The Gibbs sampler draws from a density that is proportional to the product of

(22.15) and the two densities in (22.16).

The Gibbs sampling algorithm is used to form numerical approximations of the parameters’ marginal posterior distributions. It is not feasible to construct these marginal posteriors analytically, since doing so requires high dimensional integrations over unobserved wages and value function differences. Implement­ing the Gibbs sampling algorithm requires us to factor the joint posterior defined by (22.14)-(22.16) into a set of conditional posterior densities, in such a way that each can be drawn from easily. Then, we cycle through these conditionals, draw­ing a block of parameters from each in turn. As the number of cycles grows large, the parameter draws so obtained converge in distribution to their respective marginal posteriors, given certain mild regularity conditions (see Tierney (1994) for a discussion of these conditions). An important condition is that the posterior distribution be finitely integrable, which we verify for this model in Appendix B. Given the posterior distribution of the parameters, conditional on the data, the investigator can draw exact finite sample inferences.

Our Gibbs sampling-data augmentation algorithm consists of six steps or "blocks." These steps, which we now briefly describe, are cycled through repeatedly until convergence is achieved.

Step 1. Draw value function differences {Zj, i = 1, N; j = 1, 3; t = 1, 40} Step 2. Draw unobserved wages {wijt when dit Ф j, (j = 1, 2)}

Step 3. Draw the log-wage equation coefficients P;-.

Step 4. Draw the log-wage equation error-covariance matrix Xe.

Step 5. Draw the parameters of the future component n and school payoff parameters a.

Step 6. Draw the nonpecuniary payoff covariance matrix Xn.

Step 1

We chose to draw the [Zijt, i = 1, N; j = 1, 3; t = 1, 40} one by one. Taking everything else in the model as given, it is evident from (22.14)-(22.16) that the conditional distribution of a single Zijt is truncated Gaussian. Dealing with the truncation is straightforward. There are three ways in which the Gaussian distri­bution might be truncated.

Case 1: Zjt is the value function difference for the chosen alternative. Thus, we draw Ztjt > maxjo,(Zftt)*e{u.3}J.

Case 2: Zijt is not associated with the chosen alternative, and "home" was not chosen. Thus, we draw Zijt < Ziditt.

Case 3: "Home" was chosen. In this case, we draw Zijt < 0.

We draw from the appropriate univariate, truncated Gaussian distributions using standard inverse CDF methods.

Step 2

We chose to draw the unobserved wages {wijt when dit Ф j, (j = 1, 2, 3)} one by one. Suppose wi1t is unobserved. Its density, conditional on every other wage, future component difference and parameter being known, is from (22.14), (22.15) and

(22.16)  evidently given by:

This distribution is nonstandard as wages enter in both logs and levels. Never­theless, it is straightforward to sample from this distribution using rejection methods (see Geweke (1995) for a discussion of efficient rejection sampling). In brief, we first drew a candidate wage wc from the distribution implied by the first exponential of (22.17), so that ln wc ~ N(Y’1tp1 + Xit, a*), where Xit = Xe(1, 2)єа/Х(2, 2) and a * = Xe(1, 1)(1 – (Xe(1, 2)))2/(Xe(1, 1)Xe(2, 2)). This draw is easily accomplished, and wc is found by exponentiating. The probability with which this draw is accepted is found by dividing the second exponential in

(22.17) by its conditional maximum over wi1t and evaluating the resulting expression at wi1t = wc. If the draw is accepted then the unobserved wi1t is set to wc. Otherwise, the process is repeated until a draw is accepted.

Step 3

Given all wages, value function differences, and other parameters, the density of

 £(Ри p2) – exp *    (P1/ P2) is:

So that (P1, p2) is distributed according to a multivariate normal. In particular, it is easy to show that   P ~ N[(Y’X-1Y)-1Y’X-1 ln W, (Y’X-1Y)-1],

is the regressor matrix for the first log-wage equation naturally ordered through all individuals and periods, and similarly for Y2, W1 and W2. It is straightforward to draw P from this multivariate normal density.

Step 4

With everything else known X-1 has a Wishart distribution. Specifically, it is immediate from the joint posterior that p(X-1) — |X-1 |~exp{-yfr(S(P)X-1)}, so that

X-1 ~ W(S(P), NT),

where S(p) = (ln W1 – Y1p1, ln W2 – Y2p2)'(ln W1 – Y1p1, ln W2 – Y2p2).

(22.19)

It is easy to draw from the Wishart and then invert the 2 x 2 matrix to obtain Xe.

Step 5

It is convenient to draw both the future component п parameters and the para­meters a of the school payoff jointly. Since the future component for school con­tains an intercept, it and the constant in Л cannot be separately identified. Hence, we omit a0 as well as the first row from each h. it. Define the vector п* = [п’, a’]’, where a = (av a2, a2)’ and define = [W’ijt, 03]’ (j = 1, 2), and = [W’i3t, Л’,]’. Note that п* and the ¥*t are 56-vectors. Then define ¥k = [xfi*k1 ¥*k2… ¥Nt/T-1 ¥NkT], and set ¥ = [xfi1 ¥2 ¥3]’, so that ¥ is a (3 ■ NT x 56) stacked-regressor matrix. Similarly, define the corresponding 3 ■ NT-vector Г by Г = ({Zi1t – wi1t}’t, {Zi2t – wi2t}j t, {Zi3tyity. It is immediate from (22.15), in which п* enters only through
the second exponential expressions that, conditional on everything else known, n* has a multivariate normal density given by:

n* ~ N[(¥’Q-1¥)-1¥’Q-1r, (¥’Q-1¥)-1] (22.20)

where Q = ® INT. We draw from this using a standard, multivariate normal

random number generator.

Step 6

With everything else known the distribution of Zn1 is Wishart; Zn1 ~ PV(SSTn, NT), where SSTn = hFi(nilt Пт Пї3іУ(Пт Пт Hat), and with the цір defined by (22.12). It is easy to draw from this distribution and then invert the 3 x 3 matrix to obtain Zn.