# Semiparametric and Nonparametric Approaches

1.3 Semiparametric two-stage estimation

Manski (1975) showed that a parametric distribution is not necessary for consis­tent estimation of discrete choice models, and thus originated the semiparametric estimation literature in microeconometrics. Recognition of inconsistency of the maximum likelihood and two-stage estimation methods under a misspecified
error distribution has speeded up the studies on semiparametric and nonpara­metric estimation methods. Cosslett (1991) initiated semiparametric estimation of the sample selection model with a binary selection equation. Based on a series approximation to an unknown density function, Gallant and Nychka (1987) have proposed a consistent semi-nonparametric maximum likelihood method. The asymptotic distributions of both the estimators of Cosslett and Gallant and Nychka are unknown. Robinson (1988) obtained a semiparametric estimator of para­meters of the outcome equation and derived its asymptotic distribution. His method is based on a nonparametric kernel regression correction of the selection – bias term. Subsequent contributions have largely concentrated on index formula­tions for dimension reduction. Powell (1987) considered a single index case and Ichimura and Lee (1991) studied the general multiple index situation. Ahn and Powell (1993) considered a probability index formulation. The approaches in Robinson, Powell, and Ahn and Powell are single equation two-stage estima­tion methods with nonparametric kernel regression functions. The approach in Ichimura and Lee (1991) is a semiparametric nonlinear least squares method, which can also be used for truncated sample on outcome equations. Others (Newey, 1988; Andrews, 1991) have used series approximations for conditional expectations.

These two-stage estimation methods are motivated by the implied bias – corrected outcome equation having the form

У; = x\$ + y(z-y) + Пі, (18.12)

where E(n 1I; = 1, xi) = 0 for cross-sectional data. For semiparametric models, у is an unknown function but can be estimated by some nonparametric estimators. The various semiparametric two-stage methods differ from each other on how у has been estimated. As у may be a linear function, it is essential that there is at least one variable which is in z but not in x for the identification of в in a semiparametric two-stage estimation procedure. If this exclusion condition does not hold for (18.12), some linear transformation of в, which creates the exclusion requirement, can still be identified and estimated (Chamberlain, 1986; Lee, 1994b). With the unknown у replaced by a nonparametric function En(zi, 9) where 9 = (в, Y), various suggested approaches amount to estimate unknown parameters of the equation yi = x\$ + En(z;, 9) + % In the index context, one only needs to know that y(zy) = E(y – xe | zy) is a function of zy. For a kernel type estimator, y(z;Y) can  be estimated by a nonparametric regression estimator, En(z;, 9) = n „ – ,, ,

^i*iK(—)

where K is a kernel function and an is a bandwidth or window width. The con­sistency and asymptotic distribution of a derived estimator of в (and/or y) depend on certain essential conditions on a selected sequence of bandwidths {an}. The bandwidth sequence is required to converge to zero as n goes to infinity, but its rate of convergence cannot be too fast. The rate of convergence of a nonparametric regression will, in general, depend on the degree of smoothness of underlying densities of disturbances and regressors of the model. For series approximations, the corresponding problem refers to the number of terms included
in an approximation. For general issues on nonparametric regression, see Ullah, Chapter 20 in this volume.

The use of the kernel-type regression function (or series approximation) is valuable in that asymptotic properties of estimators can be established. The в can be consistently estimated and the semiparametric estimators are *Jn – consistent and asymptotic normal. For empirical applications, one has to be careful on selecting appropriate bandwidths. The bandwidth selection can be a complicated issue. In practice, one may hope that a bandwidth parameter can be automatically determined. The Cosslett two-stage approach has the latter feature. The implicit window widths in his approach are automatically determined. At the first stage, a semiparametric maximum likelihood procedure is used to estimate у and the distribution F of choice equation disturbance e under the assumption that e and u are independent of all regressors. The estimator of F for each у is F(-1 у) derived by maximizing the loglikelihood function ln L (F, у) = XUI ln F (z;y) + (1 – I)ln (1 – F(Zjj))] with respect to F. The estimator of у is then derived by maximizing ln L(F(■ | у), у) with respect to y. The estimator F is also used for the estimation of в in the second stage. The estimator F is a step function with steps located at some e*, j = 1,…, J, where e* < e* < … < e*. The number of steps, their locations and their heights are all determined in the first-stage estimation. The implicit bandwidths e* – ep for j = 1,…, J with e* = -^ as a convention, are automatic. Under the assumption that u and e are independent of x and z, v(z;y) in (18.12) is y(zy) = fit E(u| e)dF(e)/f! L dF (г). With F replacing F, the estimated y(zy) is a constant Xj for all zy in the interval (e*-1, e*), where

Xj = fit E(ule)dF(e)/fj1 dF (г). Define the subset of sample observations Sj = {i| e*-1 < zy < e* and Ii = 1} and the set indicator Isj. Cosslett’s approach leads to the estimation of the regression equation with added dummy regressors: Уі = x;p + X J=1 XjIS (i) + n. Cosslett (1991) showed that the estimator is consis­tent. However, its asymptotic distribution remains unknown. The automatic window width in the approach may have induced complications for asymptotic analysis.