# Goodness of fit measures and choices of kernel and bandwidth

The LLS estimators are easy to implement. Once a window width and kernel are chosen, Kix = K(xfx) can be computed for each value of the x = x;, j = 1,… n, in the sample and then substituted in the LLS, LLLS, and N-W (LCLS) estimators given above. Confidence intervals for the LLLS and N-W estimators can then be obtained by using the asymptotic normality results. The key issues about the LLS estimators therefore involve the selection of kernel and window width. Regarding the choice of kernel we merely remind readers that for the large data sets it is now believed that the choice of smoothing kernel is not crucial, and that data should be transformed to a standardized from before entry into kernels. Also, in practice the product kernels are easy to use and perform well, that is K(^i) = ПsK(ysi) where s = 1,…, q; and K(ysi) can be taken as the univariate normal with unbounded support or the Epanechnikov kernel with bounded support K(ysi) =|(1 – у2),iysii – 1. These kernels are second-order kernels, implying their first moments are zero but the second are finite. Another class of kernels, known as higher order kernels with higher order moments as zero, are used in order to reduce the asymptotic bias problem in the LLS estimators. However, in practice the gains from these higher order kernels are not significant. For details on the choice of kernels see Silverman (1986).

The window width h controls the smoothness of the LLS estimate of m(x) and in practice is crucial in obtaining a good estimate that controls the balance between the variance, which is high when h is too small, and the squared bias which is high when h is too large. With this in mind several window width selection procedures have tried to choose h to minimize a number of mean squared error (MSE) criteria, for example f(m(x) – m(x))2dx, Ef((m(x) – m(x))2dx) = f MSE(m(x))dx = integrated MSE = IMSE and the average ImSe = AiMSE = Ef(m(x) – m(x))2f (x)dx. The minimization of IMSE is known to provide the optimal h to be c n~1/(q+4) where c is a constant of proportionality which depends on unknown density and m(x) and their derivatives. An initial estimate of c can be constructed giving "plug-in" estimators of h but this has not been a very popular procedure in practice. A recent proposal from Hardle and Bowman (1988) is to estimate AIMSE by bootstrapping and then minimizing the simulated AIMSE with respect to h. An advantage of this approach is that it also provides a confidence interval for m.

Cross validation is a popular alternative procedure, which chooses h by minimizing the sum of squares of the estimated prediction error (EPE) or residual sum of squares (RSS), EPE = RSS = nХП (y; – D-;)2 = nХП й-; = Ll = iMML where y-; = m-i(x) = w-;y, й-; = y; – D-и u = y – = My and M = I – W-;; subscript

-i indicates the "leave-one-out" estimator, deleting ith observation in the sums is used. An alternative is to consider EPE* = yMLJj where M* = M’M and tr (M*) can be treated as the degrees of freedom in the nonparametric regression, as an analogue to the linear parametric regression case.

One drawback of the "goodness-of-fit function" EPE is that there is no penalty for large or small h. In view of this, many authors have recently considered the penalized goodness of fit function to choose h see Rice (1984) and Hardle, Hall, and Marron (1992). The principle is the same as in the case of penalty functions used in the parametric regression for the choice of number of parameters (variables), for example the Akaike criterion. The idea of a penalty function is, in general, an attractive one and it opens up the possibility of future research.

A related way to obtain h is to choose a value of h for which the square of correlation between y; and y;(pC) is maximum, that is 0 < R2 = pC ^ 1 is maximum. One could also use the correlation between y and leave-one-out estimator y-;.

When V(u; | x;) = о 2(x;), one can choose h such that an estimate of unconditional variance of u;, Eu2 = E[E(u2 |x;)] = E[o2(x;)] = fo2(x)f(x)dx is minimum. That is, choose h such that EPEj = Eu2 = fc2(x)dF(x) where 62(x;) = E(fl2 | x;) is obtained by the LPLS regression of fl2 on x;; B; = y{ – m(x,) is the nonparametric residual, and f (x) = Y’}wi(x) is a nonparametric density estimator for some weight function Wj(x) such that fw;(x)dx = 1. For the kernel density estimator w;(x) = K(xf2x)/nhq, see Silverman (198б) and Pagan and Ullah (1999). It is better to use EPE1 than EPE when there is a heteroskedasticity of unknown form. Also, since E(u2;) can be shown to be a consistent estimator of Eu2 the nonparametric version of R2,

R1 = 1 – і. iies between 0 and 1, and it is an estimator of p2 = 1 – Eu =

1 П!"(y, – у)2 v(y,)

1 _ E(Уі – m (x,))2

V( Уі)

Thus, an alternative is to choose h such that Rl is maximum. One can also use EPE* in R1. This will correspond to R2 in the parametric regression. A simple way to calculate EPE1 is to consider the empirical distribution function so that

EPE1 = -1 I 62(Xi).

In general the move from independent to dependent observations should not change the way window width selection is done. However, care has to be taken since, as indicated by Robinson (1986), a large window width might be needed for the dependent observations case due to the positive serial correlations (see Herrman, Gasser, and Kneip, 1992). Faraway (1990) considers the choice of varying window width. For details on the choices of h and their usefulness see Pagan and Ullah (1999).

## Leave a reply