The Truncated Regression Model
The truncated regression model excludes or truncates some observations from the sample. For example, in studying poverty we exclude the rich, say with earnings larger than some upper limit yu from our sample. The sample is therefore not random and applying least squares to the truncated sample lead to biased and inconsistent results, see Figure 13.3. This differs from censoring. In the latter case, no data is excluded. In fact, we observe the characteristics of all households even those that do not actually purchase a car. The truncated regression model is given by
y* = х’ів + ui i = 1,2,…,n with Ui ~ IIN(0,a2) (13.60)
where y* is for example earnings of the i-th household and xi contains determinants of earnings like education, experience, etc. The sample contains observations on individuals with y* < yu. The probability that yi* will be observed is
Pr[y* < yu] = Рг[хІ/ + Ui < yu] = Pr[ui <yu – х’ів] = Ф( 1(yu – хів)) (13.61)
In addition, using the results of a truncated normal density, see Greene (1993, p. 685)
which is not necessarily zero. From (13.60) one can see that E(y*/y* < yu) = х’ів + E(ui/y* < yu). Therefore, OLS on (13.60) using the observed y* is biased and inconsistent because it ignores the term in (13.62).
The density of y* is normal but its total area is given by (13.61). A proper density function has to have an area of 1. Therefore, the density of y* conditional on y* < yu is simply the conditional density of y* restricted to values of y* < yu divided by the Pr[y* < yu], see the Appendix to this chapter:
It is the last term which makes MLE differ from OLS on the observed sample. Hausman and Wise (1977) applied the truncated regression model to data from the New Jersey negative – income-tax experiment where families with incomes higher than 1.5 times the 1967 poverty line were eliminated from the sample.