# Semiparametric IV estimation and conditional moments restrictions

Simultaneous equation models with selectivity can also be estimated by semiparametric methods. Semiparametric IV methods for the estimation of sam­ple selection models are considered in Powell (1987) and Lee (1994b). Powell (1987) has a interest in the asymptotic property of a general semiparametric IV estimator. Lee (1994b) follows the literature on classical linear simultaneous equa­tion models by focusing on both the identification and estimation of a structural equation with sample selection. It considered possible generalizations of two- stage least squares methods and their possible optimum IV property. Consider a single linear structural equation

y* = y*a + xJ5 + u1, (18.14)

where y* is a latent endogenous variable, y* is a vector of latent endogenous variables not including y*, x is a vector of exogenous variables in the system, and xJ, where J is a selection matrix, represents the subset of exogenous variables included in this structural equation. The reduced form equation of y* is y* = xn2 + v2. The sample observations y1 and y2 of y* and y* are subject to selection. The selection equation is I* = x y – e. y1 and y2 are observed if and only if I* > 0. As in the index framework, the joint distribution of (u1, v2, e) conditional on x is assumed to be a function of the index x y. In this system with sample selection, the identification of structural parameters requires stronger conditions than the usual rank condition in the classical linear simultaneous equation model (without selec­tivity) and the parametric linear simultaneous equation sample selection model considered in Lee et al. (1980) and Amemiya (1983). Let y* = xn1 + v1 be the implied reduced form equation for y*. Conditional on x and I = 1, E(y1 | x, I = 1) = xn1 + E(v11 xу, xу > e) and E(y21 x, I = 1) = хП2 + E(v21 xy, xj > e). As in the classical linear simultaneous equation model, the identification of structural parameters is directly related to the reduced form parameters. However, contrary to the clas­sical system, the reduced form parameter vectors n1 and П2 are not identifiable because the same x appears in the selection equation. It turns out that some linear combinations of the reduced form parameters can be identified. As the selection equation is a single index model, a conventional normalization suggested by Ichimura (1993) is to set the coefficient of a continuous and exogenous variable to be unity, i. e. xj = x(1) + x(2)Z, where xm is a relevant continuous and exogenous variable in x = (xm, xw). With the partition of x into xm and x^, the above equations imply that the parameters n*1) and П* in E( y11 x, I = 1) =

 п* = П*а – 51Z + 5:  I = 1) and E( y21 x, I = 1) = x^n *2) + E(v* | x у, I = 1) are identifiable. The structural parameters are related to those identified reduced-form parameters as

The identification of the structural parameters follows from this relation. With exclusion restrictions as in (18.14), one can see that the order identification condi­tion for the semiparametric model corresponds to the overidentification condi­tion of the classical linear simultaneous equation model. The stronger condition for the identification of the semiparametric model is due to the addition of a selection bias term with an unknown form in the bias-corrected structural equa­tion. Exogenous variables excluded from the structural equation (18.14) before bias correction reappear in the selection bias term through the index x y. Such exogenous variables identify the bias term. But, the bias-correction term intro­duces excluded exogenous variables back into the bias-corrected structural equa­tion. It follows that the effective number of the included exogenous variables in this equation is the number of originally included exogenous variables plus one. Therefore, the order condition for identification requires stronger exclusion re­strictions than the classical model or a parametric model. For a parametric model under normal disturbances, the bias-correction term has a known nonlinear form which can help identification.

The structural parameters a and 5 can be estimated via (18.15) by Amemiya’s minimum distance methods. But semiparametric least squares are relatively simple and illustrative. For the semiparametric estimation of the structural equation (18.14), let w = (y2, xJ) and в = (a, 5). For any possible value (в, у), E(y1 – wP | xу, I = 1) evaluated at a point x;y of x у can be estimated by a nonparametric kernel esti­mator En(y11 x;y) – En(w | x;y)P, where у is a consistent estimate of у from the selection equation. The bias-corrected structural equation becomes

If p is a vector of instrumental variables for w, a semiparametric IV estimator of P can be

where tn(xiy) is a weighting or trimming function. Powell (1987) used the de­nominator in the nonparametric kernel regression functions En(w xy) and En(y xy) as the weight so as to cancel the denominator in those kernel regression functions. This weighting plays the role of trimming and has nothing to do with the variance of disturbances Uni in equation (18.16). Lee (1994b) suggested a semiparametric two-stage least squares estimator and a more efficient semiparametric generalized two-stage least squares estimator. The latter takes into account both the heteroskedasticity of disturbances and the variance and covariance due to y. The disturbance Uni consists of three components as Uni = (иц – En(u1 Xij)) = (% – E(u1 XiY)) – (En(u1 ху) – En(u x-y)) – (En(u1 x-y) – E(u1 x-y)). The first component represents the disturbance in the structural equation after the correction of selection bias. The second component represents the disturbance introduced in En(u1 x-y) by replacing у by the estimate y. These two components are asymptotically uncorrelated. The last component represents the error introduced by the nonparametric estimate of the conditional expecta­tion of u1i. The last component does not influence the asymptotic distribution of a semiparametric two-stage estimator due to an asymptotic orthogonality property of the index structure. As the variance of u1i – E(u1 xiy) is a function of x-y, it can be estimated by a nonparametric kernel estimator &ni. Let X be the variance matrix of the vector consisting of йш, which is determined by the first two components of Uni. It captures the heteroskedastic variances of the first component and the covariance of the second component across sample observations due to y. A feasible semiparametric generalized two-stage least – squares estimator can either be S = [W, X2(X2X2)-1X’2X~1W ]-1М’Х2(Х2Х2)-1Х2Х-1У, or

U = [M’X-1X2(X2X-1X2)-1X2X-1M]-1M, X-1X2(X2X-1X2)-1X’2X-1y, where the elements of X2, W and C are, respectively, fn(xiy)(x(2),- – En(x(2) x1iy)), fn(xiy)(w,- – En(w x1iy)) and tn(xy)(y1i – En( y1 x1iy)). These two estimators are asymptotically equivalent and are asymptotically efficient semiparametric IV estimators (conditional on the choice of first-stage estimator y and the trimming function). These semiparametric methods are two-stage estimation methods in that the selection equation is separately estimated and its coefficient estimate y is used for the estimation of the outcome equations.

Instead of two-stage methods, it is possible to estimate jointly the selection and structural outcome equations so as to improve efficiency (Lee, 1998). As a gener­alization, consider the estimation of a nonlinear simultaneous equation sample selection model: g( y*, x, P) = u, where the vector y* can be observed only if xу > e. Under the index assumption that the joint distribution of u and e conditional on x may depend only on xy, the bias-corrected structural system is g( y, x, P) = E(g( y, x, P) I = 1, xу) + n, where n = u – E(u I = 1, xу) with its variance being a

function of xy. This system implies the moment equation E(g(y, x, P)| I = 1, x) = E(g(y, x, P)| I = 1, xу) at the true parameter vector. For a truncated sample selection model where only sample observations of y and the event I = 1 are available, this moment equation forms the system for estimation. For a (cen­sored) sample selection model, the events of I = 1 or I = 0 are observed that introduces an additional moment equation E(I | x) = E(I | xy) at the true parameter vector. These moment equations can be used together for estimation. Suppose that these moment equations are combined and are written in a general format as E(f(z, P)|x) = E(f(z, P)|xу), where z includes all endogenous and exogenous variables in the model. The parameter vector p in the system can be estimated by semiparametric nonlinear two-stage least squares methods. E(f(z, P)| xj) can be estimated by a nonparametric regression function En(f(z, P)|xу). The relevant variance function can be estimated by Vn(xy) = En(f(z, P)f'(z, P)| xj) – En(f(z, P)|xу)En( f(z, P)|xy). Let Un(z, 0) = f(z, P) – En(f(z, P)|xy) and tn be a

proper trimming function. Let w be an IV vector. The semiparametric nonlinear weighted two-stage method with the IV w is

1   min £ tnUn(zt, d)V-1(xiy)wi £ tniw’V-1(xij)wi £ tniw’V-1(xiY)Un(zi, 9),

where у is a consistent estimate of y. Lee (1998) shows that an optimal IV is any consistent estimate of G0(x, 0) where G0(x, 0) = [E(f(dz9 e) |x) – E(df(dz9 e) |xу)] – V’E(f(z, P)|xy))[dYd(9)x – E(-dd99)x,|xy)], where VE(-|xy) denotes the gradient of E(-1 xy) with respect to the vector xу. In Lee (1998), semiparametric minimum – distance methods have also been introduced. Semiparametric minimum-distance methods compare directly En(f(z, P)| x) with En(f(z, P)| xу). A semiparametric minimum-distance method with weighting is

min £ tni[En( f (z, P)|x-) – En( f (z, P)| x-Y)]Vn-1(xiX)[En(f(z, p)| x;) – En(f(z, p)|x;y)].

i=1

The semiparametric weighted minimum-distance estimator is asymptotically equivalent to the optimal IV estimator. The semiparametric minimum-distance method has a interesting feature of not emphasizing the construction of instru­mental variables. As z in f(z, P) may or may not contain endogenous variables, semiparametric minimum-distance methods can be applied to the estimation of regression models as well as simultaneous equation models in a single framework.

The efficiency of estimating a structural equation is related to the efficiency issue for semiparametric models with conditional moment restrictions. Chamberlain (1992) investigates semiparametric efficiency bounds for semiparametric models with conditional moment restrictions. The conditional moment restriction con­sidered has the form E[p(x, y, p0, q0(x2))| x] = 0, where x2 is a subvector of x, p(x, y, P, t) is a known function, but q(x2) is an unknown mapping. Chamberlain (1992)
derives an efficiency bound for estimators of в under the conditional moment restriction. Several concrete examples are provided. Among them is a sample selection model. The sample selection model considered in Chamberlain (1992) is y* = XlP + x2S + u and I = 1, if g(x2, e) > 0; 0, otherwise, where x = (x1, x2) and y = (y1, I) with ya = Iy*, are observed. The unknown function q depends on x only via x2 but is otherwise unrestricted. The disturbances u and e satisfy the restrictions that E(u | x, e) = E(u | e) and e is independent of xa conditional on x2. It is a sample selection model with indices x2. This model implies that E(ya | x, I = 1) = xjPq + q0(x2), where q0(x2) = x2S0 + E(u | x2, I = 1), Thus p(x, y, в, t) = I(y2 – x2p – t) in Chamberlain’s framework for this sample selection model. Chamberlain pointed out that one might extend the p function to include the restriction that E(I|x) = E(I | x2), so that p(x, y, p, t) = [I(y1 – x1p – t1), I – t2] but the efficiency bound for в is the same with either one of the above p. Let о2 (x) denote var( y1 |x, I = 1). For the case where о2(x) happens to depend on x only through x2, the efficiency bound will be simplified to J = E{E(I| x2)o-2(x2)[x1 – E(x11 x2)][x1 – E(x11 x2)]’}. The semiparametric weighted minimum-distance method can be applied to estimate the sample selection model with f ‘(z, P) = [I(y – x1p), I]. Lee (1998) showed that the semiparametric weighted minimum-distance estimator attains the efficiency bound if о 2(x) happens to depend only on x2. It is of interest to note that if the moment equation E(I | x) = E(I| x2) were ignored and only the moment equation E(I(y1 – x1p0)|x] = E(I(y1 – x1p0)|x2] were used in the estimation, the resulting estimator will have a larger variance. The point is that even though the moment restriction E(I|x) = E(I|x2) does not contain p, it helps to improve the effici­ency in estimating p (as in a seemingly unrelated regression framework). On the other hand, if the conditional moment restriction E(y – x1p01 x, I = 1) = E( y1 – x1p01 x y0, I = 1) is used for estimation, the resulted estimator will have the same smaller variance. This is so because I – E(I|x) is uncorrelated with the disturbance (y – x1p0) – E(y – x1p01 x, I = 1).