Rilestone (1991) has compared the relative efficiency of semiparametric and parametric estimators of в under different types of heteroskedasticity, whereas Surekha and Griffiths (1984) compare the relative efficiency of some Bayesian and sampling theory estimators using a similar Monte Carlo setup. Donald (1995) examines heteroskedasticity in sample selection models, and provides access to
that literature. In truncated and censored models heteroskedasticity impacts on the consistency of estimators, not just their efficiency. Heteroskedasticity in the context of seemingly unrelated regressions has been studied by Mandy and Martins-Filho (1993). Further details appear in chapter 5 by Fiebig. A useful reference that brings together much of the statistical literature on heteroskedasticity is Carroll and Ruppert (1988).
2 Bayesian Inference
With Bayesian inference post-sample information about unknown parameters is summarized via posterior probability density functions (pdfs) on the parameters of interest. Representing parameter uncertainty in this way is (arguably) more natural than the point estimates and standard errors produced by sampling theory, and provides a flexible way of including additional prior information. In the heteroskedastic model that we have been discussing, namely, y; = x-P + eu with var(e;) = с(a), the parameters of interest are p, a, and o, with particular interest usually centering on p. The starting point for Bayesian inference is the specification of prior pdfs for p, o, and a. Since noninformative prior pdfs carry with them the advantage of objective reporting of results, we adopt the conventional ones for p and o (see Zellner, 1971)
ли, °) = Л UVM – constant («Ч
The choice of prior for a is likely to depend on the function h(). Possible choices are a uniform prior or a prior based on the information matrix. See Zellner (1971, p. 47) for details on the latter. Leaving the precise nature of the prior for a unspecified, the joint prior pdf for all unknown parameters can be written as
f(P, o, a) = /(P, o)f(a) – fO). (4.36)
Assuming normally distributed observations, the likelihood function can be written as
/(y|p, o, a) – – n ^|-1/2exp|-^(y – хр)’Л-1(у – XP)| (4.37)
The joint posterior pdf for (P, o, a) is
/^ o, a 1 y) – /(у1 p, o, a)f (p, o a)
– – NJ^expJ-2-2<y – ХР)’Л-1(у – XP)
Once this joint posterior pdf for all parameters has been obtained, the major task is to derive marginal posterior pdfs for each single parameter. The information in a marginal posterior pdf can then be represented in a diagram or summarized via the moments of the pdf. Where possible, marginal posterior pdfs are obtained by integrating out the remaining parameters. Where analytical integration is not possible, numerical methods are used to estimate the marginal posterior pdfs. There are a variety of ways in which one could proceed with respect to equation (4.38). The steps for one way that is likely to work well are:
1. Integrate о out to obtain the joint posterior pdf /(p, a | y).
2. Integrate в out of the result in step 1 to obtain the posterior pdf /(a | y).
3. Use a Metropolis algorithm to draw observations from the density /(a | y).
4. Construct the conditional posterior pdf /(p | a, y) from the joint posterior pdf that was obtained in step 1; note the conditional mean E[p | a, y] and conditional variance var[p | a, y].
5. From step 4, note the conditional posterior pdf and corresponding moments for each element, say pk, in the vector p.
6. Find estimates of the marginal posterior pdf for pk, (k = 1, 2,…, K), and its moments, by averaging the conditional quantities given in step 5, over the conditioning values of a drawn in step 3.
We will consider each of these steps in turn.
The joint posterior pdf for (p, a) is given by
/(P, a, о | y)do
– /(a)| Л | -1/2[N62(a) + (p – P(a))’ Х’Л-1Хф – P(a))]-N/2 (4.39)
where 62(a) and P(a) are defined in equations (4.23) and (4.24). The pdf in (4.39) is not utilized directly; it provides an intermediate step for obtaining /(a | y) and
/(P | a, y).
The marginal posterior pdf for the parameters in the variance function is given by
/(a | y) = /(P, a|y)dp
– /(a)| Л | -1/2[6(a)]-(N-K) |Х’Л-1Х|-1/2
The pdf in equation (4.40) is not of a recognizable form, even when imaginative choices for the prior f(a) are made. Thus, it is not possible to perform further analytical integration to isolate marginal posterior pdfs for single elements such as a s. Instead, a numerical procedure, the Metropolis algorithm, can be used to indirectly draw observations from the pdf f(a | y). Once such draws are obtained, they can be used to form histograms as estimates of the posterior pdfs for single elements in a. As we shall see, the draws are also useful for obtaining the posterior pdfs for the в k.
The random walk Metropolis algorithm which we describe below in the context of the heteroskedastic model is one of many algorithms which come under the general heading of Markov Chain Monte Carlo (MCMC). A recent explosion of research in MCMC has made Bayesian inference more practical for models that were previously plagued by intractable integrals. For access to this literature, see Geweke (1999).
The first step towards using a convenient random walk Metropolis algorithm is to define a suitable "candidate generating function." Assuming that the prior f(a) is relatively noninformative, and not in conflict with the sample information, the maximum likelihood estimate 7 provides a suitable starting value am for the algorithm; and the maximum likelihood covariance matrix V7 provides the basis for a suitable covariance matrix for the random walk generator function. The steps for drawing the (m + 1)th observation a (m+1) are as follows:
1. Draw a* = aw + e where e ~ N(0, cV7) and c is scalar set so that a* is accepted approximately 50 percent of the time.
r = f (a*1 y)
f (a(m)1 У)
Note that this ratio can be computed without knowledge of the normalizing constant for f(a | y).
3. Draw a value u for a uniform random variable on the interval (0, 1).
4. If u < r, set a(m+1) = a*.
If u > r, set a (m+1) = a w.
5. Return to step 1, with m set to m + 1.
By following these steps, one explores the posterior pdf for a, generating larger numbers of observations in regions of high posterior probability and smaller numbers of observations in regions of low posterior probability. Markov Chain Monte Carlo theory suggests that, after sufficient observations have been drawn, the remaining observations are drawn from the pdf f(a | y). Thus, by drawing a large number of values, and discarding early ones, we obtain draws from the required pdf.
The conditional posterior pdf /ф | a, y) is obtained from the joint pdf /ф, a | y) by simply treating a as a constant in equation (4.39). However, for later use we also need to include any part of the normalizing constant that depends on a. Recognizing that, when viewed only as a function of p, equation (4.39) is in the form of a multivariate student-f pdf (Judge et al., 1988, p. 312), we have
|X^-1X|1/2[a(a)]N-K[Na2(a) + (p – 0(a))’ X^-1X(p – 0(a))]-N/2
This pdf has
mean = E(P | a, y) = 0(a) = (X^-1X)-1X4-1y (4.42)
degrees of freedom = N – K.
Let akk(a) be the kth diagonal element of (X^-1X)-1, and 0k(a) be the kth element of 0(a). The conditional marginal posterior pdf for pk given a is the univariate-f pdf
/(pk|a, y) = k*[d(a)]N-K[akk(a)](N-K)/2[Nd2(a)akk(a) + (pk – 0k(a))2]-(N-K+1)/2
where k* is a
degrees of freedom = N – K.
Equations (4.42) and (4.45) provide Bayesian quadratic-loss point estimates for p given a. Note that they are identical to the generalized least squares estimator for known a. It is the unknown a case where sampling theory and Bayesian inference results for point estimation of P diverge. The sampling theory point estimate in this case is 0(7). The Bayesian point estimate is the mean of the marginal posterior pdf /(P | y). It can be viewed as a weighted average of the 0(a) over all a with /(a | y) used as the weighting pdf. The mechanics of this procedure are described in the next step.
An estimate of the marginal posterior pdf f(Pk | y) is given by
/(Рк|У) = — X f (Pk|a(m), У)
= M X «6(a(m))r»(m))rK)/2
x [Na2(a(m))akk(a(m)) + (pk – Pk(aw))2r(N-K+1)/2) (4.47)
where a(1), a(2),…, a(M) are the draws from f(a | y) that were obtained in step 3. To graph /(pk | y) a grid of values of pk is chosen and the average in equation (4.47) is calculated for each value of pk in the grid. The mean and variance of the marginal posterior pdf f(Pk | y) can be estimated in a similar way. The mean is given by the average of the conditional means
_ = E(P|y) = — X S k(a (m)). (4.48)
The variance is given by the average of the conditional variances plus the variance of the conditional means. That is,
Presenting information about parameters in terms of posterior pdfs rather than point estimates provides a natural way of representing uncertainty. In the process just described, the marginal posterior pdfs also provide a proper reflection of finite sample uncertainty. Maximum likelihood estimates (or posterior pdfs conditional on a) ignore the additional uncertainty created by not knowing a.
There are, of course, other ways of approaching Bayesian inference in heteroskedastic models. The approach will depend on specification of the model and prior pdf, and on the solution to the problem of intractable integrals. Gibbs sampling is another MCMC technique that is often useful; and importance sampling could be used to obtain draws from f(a | y). However, the approach we have described is useful for a wide range of problems, with specific cases defined by specification of h(a) and f(a). Other studies which utilize Bayesian inference in heteroskedastic error models include Griffiths, Drynan, and Prakash (1979) and Boscardin and Gelman (1996).