Sample Selection Bias
The problem of selection bias in economics arises when sampling observations are generated from the population by rules other than simple random sampling. Consequently, the sample representation of a true population is distorted. This is the essence of the selection problem. Distorted sample generation may be the
outcome of sample collection by surveyors. More importantly, distorted sample observations result from self-selection decisions by the agents being studied. A sample generated by self-selection may not represent the true population distribution of characteristics no matter how big the sample size. However, self-selection biases can be corrected to produce an accurate description of the underlying population if the underlying sampling generating processes can be understood and relevant identification conditions are available. Economic theories and institutional settings can provide guidance. It is for this reason that the econometrics of self-selection is, by and large, a subject of microeconometrics.
The issue of selectivity bias first arose in labor economics, namely, the determinants of occupational wages in Roy (1951) and labor supply behavior of females in Gronau (1974) and Heckman (1974). Consider the labor supply problem of females in a free society. In a population of women, each individual is characterized by her endowments of observable and unobservable characteristics. (All characteristics are, of course, known to an individual herself, but some may be unobservable to an investigator.) She has the freedom to engage in market activities. It may be observed that only a subsample of the population is engaged in market employment and reports wages. A researcher or a policy maker may be interested in identifying the determinants of wages for working women so as to understand the determinants of wages for all women. The decision to work or not to work is not random as it is made in accordance with an individual’s own interest. Consequently, the working and nonworking samples may have different characteristics. Sample selection bias arises when some component of the work decision is relevant to the wage determining process or the (expected) wage is a factor in the working decision. When the relationship between the work decision and the wage is purely through the observables and those observable variables are exogenous, the sample selection bias is controlled for when all relevant exogenous variables are included in the equations. The possibility of sample selection bias arises when there are unobservable characteristics that influence both the observed outcomes and the decision process.