# Bayesian inference

In order to define the sampling model,5 we make the following assumptions about and zi for i = 1 … N:

1. p(vi | h_1) = f N(vi |0, h_1) and the vis are independent;

2. vi and z; are independent of one another for all i and l;

3. p(zt | ^-1) = fG(zi |1, ^-1) and the zis are independent.

The first assumption is commonly made in cross-sectional analysis, but the last two require some justification. Assumption 2 says that measurement error and inefficiency are independent of one another. Assumption 3 is a common choice for the nonnegative random variable, zi, although others (e. g. the half-normal) are possible. Ritter and Simar (1997) show that the use of very flexible one-sided distributions for zi such as the unrestricted gamma may result in a problem of weak identification. Intuitively, if zi is left too flexible, then the intercept minus zi can come to look too much like vi and it may become virtually impossible to distinguish between these two components with small data sets. The gamma with shape parameter 1 is the exponential distribution, which is sufficiently different from the normal to avoid this weak identification problem.6 In addition, van den Broeck et al. (1994) found the exponential model the least sensitive to changes in prior assumptions in a study of the most commonly used models. Note that X is the mean of the inefficiency distribution and let 9 = (P’, h, X)’ denote the parameters of the model.

The likelihood function is defined as:

N

L(y; 0) = Пv(Vix, 0)/

і=1

which requires the derivation of p(yi x, 0) = /p(yi x, z, Q)p(zi 0)dzi. This is done in Jondrow, Lovell, Materov, and Schmidt (1982) for the exponential model and in van den Broeck et al. (1994) for a wider class of inefficiency distributions. However, we do not repeat the derivation here, since we do not need to know the explicit form of the likelihood function. To understand why isolating the likelihood function is not required, it is necessary to explain the computational methods that we recommend for Bayesian inference in stochastic frontier models.

Bayesian inference can be carried out using a posterior simulator which generates draws from the posterior, p(0|y, x). In this case, Gibbs sampling with data augmentation is a natural choice for a posterior simulator. This algorithm relies on the fact that sequential draws, 0(s) and z(s), from the conditional posteriors p(0 | y, x, z(s-1)) and p(z | y, x, 0(s)), respectively, will converge to draws from p(0, z | y, x) from which inference on the marginal posteriors of 0 or of functions of z (such as efficiencies) can immediately be derived. In other words, we do not need to have an analytical formula for p(0 | y, x) (and, hence, the likelihood function), but rather we can suffice with working out the full conditional distributions p(0 | y, x, z) and p(z | y, x, 0). Intuitively, the former is very easy to work with since, conditional on z, the stochastic frontier model reduces to the standard linear regression model.7 If p(0 | y, x, z) as a whole is not analytically tractable, we can split up 0 into, say, в and (h, X) and draw sequentially from the full conditionals p(P | h, X, y, x, z), p(h, X | в, y, x, z), and p(z | y, x, в, h, X). However, before we can derive the Gibbs sampler, we must complete the Bayesian model by specifying a prior for the parameters.

The researcher can, of course, use any prior in an attempt to reflect his/her prior beliefs. However, a proper prior for h and X-1 is advisable: Fernandez et al.

(1997) show that Bayesian inference is not feasible (in the sense that the posterior distribution is not well-defined) under the usual improper priors for h and X-1. Here, we will assume a prior of the product form: p(0) = p( e)p(h)p(X-1). In stochastic frontier models, prior information exists in the form of economic regularity conditions. It is extremely important to ensure that the production frontier satisfies these, since it is highly questionable to interpret deviations from a non-regular frontier as representing inefficiency. In an extreme case, if the researcher is using a highly flexible (or nonparametric) functional form for f () it might be possible for the frontier to fit the data nearly perfectly. It is only the imposition of economic regularity conditions that prevent this overfitting. The exact form of the economic regularity conditions depend on the specification of the frontier. For instance, in the Cobb-Douglas case, в, ^ 0, i = 1… k ensures global regularity of the production frontier. For the translog specification things are more complicated and we may wish only to impose local regularity. This requires checking certain conditions at each data point (see Koop et al., 1999). In either case, we can choose a prior for в which imposes economic regularity. As emphasized by

Fernandez et al. (1997), a proper or bounded prior is sufficient for p. Thus, it is acceptable to use a uniform (flat) prior:

P(P) – Щ, (24.5)

where I(E) is the indicator function for the economic regularity conditions. Alternatively, a normal prior for p is proper and computationally convenient. In this chapter, we will use p( P) as a general notation, but assume it is either truncated uniform or truncated normal. Both choices will easily combine with a normal distribution to produce a truncated normal posterior distribution.

For the other parameters, we assume gamma priors:

p(h) = fG(h | ah, bh) (24.6)

and

p(x-1) = fG(X-1lax, b). (24.7)

Note that, by setting ah = 0 and bh = 0 we obtain p(h) — h1, the usual noninformative prior for the error precision in the normal linear regression model. Here, the use of this improper prior is precluded (see Theorem 1 (ii) of Fernandez et al., 1997), but small values of these hyperparameters will allow for Bayesian inference (see Proposition 2 of Fernandez et al, 1997) while the prior is still dominated by the likelihood function. The hyperparameters ax and bx can often be elicited through consideration of the efficiency distribution. That is, researchers may often have prior information about the shape or location of the efficiency distribution. As discussed in van den Broeck et al. (1994), setting ax = 1 and bx = – ln (t*) yields a relatively noninformative prior which implies the prior median of the efficiency distribution is t *. These are the values for ax and bx used in the following discussion.

The Gibbs sampler can be developed in a straightforward manner by noting that, if z were known, then we could write the model as y + z = xP + v and standard results for the normal linear regression model can be used. In particular, we can obtain

p(p | y, X, z, h, x 1) = fN+1(p | S, h r(x’x) 1)p(P), (24.8)

where

S = (x’x) V( y + z).

Furthermore,

Also, given z, the full conditional posterior for X 1 can easily be derived:

p(X-1 |y, x, z, p, h) = /G(X-1 IN + 1, z’iN – ln (t*)). (24.10)

Equations (24.8), (24.9), and (24.10) are the full conditional posteriors necessary for setting up the Gibbs sampler conditional on z. To complete the posterior simulator, it is necessary to derive the posterior distribution of z conditional on 0. Noting that we can write z = xp – y + v, where v has pdf /N(v |0, h-1IN) and zi is a priori assumed to be iid /G(zi |1, X-1),8 we obtain:

N

p (z | y, x, P, h, X-1) – /N(z | xp – y – h-1X-1iN, hr1IN) П I(zi > 0). (24.11)

i =1

A Gibbs sampler with data augmentation on (p, h, X-1, z) can be set up by sequentially drawing from (24.8), (24.9), (24.10), and (24.11), where (p, h) and X-1 are independent given z, so that (24.10) can be combined with either (24.8) or

(24.9) and there are only three steps in the Gibbs. Note that all that is required is random number generation from well known distributions, where drawing from the high-dimensional vector z is greatly simplified as (24.11) can be written as the product of N univariate truncated normals.

Given posterior simulator output, posterior properties of any of the parameters or of the individual ps can be obtained.9 The latter can be calculated using simulated draws from (24.11) and transforming according to Ti = exp (-z;). It is worth stressing that the Bayesian approach provides a finite sample distribution of the efficiency of each firm. This allows us to obtain both point and interval estimates, or even e. g. P(Ti > Tj | y, x). The latter is potentially crucial since important policy consequences often hinge on one firm being labeled as more efficient in a statistically significant sense. Both DEA and classical econometric approaches typically only report point estimates. The DEA approach is nonparametric and, hence, confidence intervals for the efficiency measures obtained are very hard to derive.10 Distributional theory for the classical econometric approach is discussed in Jondrow et al. (1982) and Horrace and Schmidt (1996). These papers point out that, although point estimates and confidence intervals for t can be calculated, the theoretical justification is not that strong. For example, the maximum likelihood estimator for t is inconsistent and the methods for constructing confidence intervals assume unknown parameters are equal to their point estimates. For this reason, it is common in classical econometric work to present some characteristics of the efficiency distribution as a whole (e. g. estimates of X) rather than discuss firm specific efficiency. However, firm specific efficiencies are often of fundamental policy importance and, hence, we would argue that an important advantage of the Bayesian approach is its development of finite sample distributions for the ps.

## Leave a reply