Public-Private Face-Off

The C&B data set includes more than 14,000 former students. These students were admitted and rejected at many different combinations of schools (C&B asked for the names of at least three schools students considered seriously, besides the one attended). Many of the possible application/acceptance sets in this data set are represented by only a single student. Moreover, in some sets with more than one student, all schools are either public or private. Just as with groups C and D in Table 2.1. these perfectly homogeneous groups provide no guidance as to the value of a private education.

We can increase the number of useful comparisons by deeming schools to be matched if they are equally selective instead of insisting on identical matches. To fatten up the groups this scheme produces, we’ll call schools comparable if they fall into the same Barron’s selectivity categories.- Returning to our stylized matching matrix, suppose All State and Tall State are rated as Competitive, Altered State and Smart are rated Highly Competitive, and Ivy and Leafy are Most Competitive. In the Barron’s scheme, those who applied to Tall State, Smart, and Leafy, and were admitted to Tall State and Smart can be compared with students who applied to All State, Smart, and Ivy, and were admitted to All State and Smart. Students in both groups applied to one Competitive, one Highly Competitive, and one Most Competitive school, and they were admitted to one Competitive and one Highly Competitive school.

In the C&B data, 9,202 students can be matched in this way. But because we’re interested in public-private comparisons, our Barron’s matched sample is also limited to matched applicant groups that contain both public and private school students. This leaves 5,583 matched students for analysis. These matched students fall into 151 similar- selectivity groups containing both public and private students.

Our operational regression model for the Barron’s selectivity-matched sample differs from regression (2.1). used to analyze the matching matrix in Table 2.1. in a number of ways. First, the operational model puts the natural log of earnings on the left-hand side instead of earnings itself. As explained in the chapter appendix, use of a logged dependent variable allows regression estimates to be interpreted as a percent change. For example, an estimated jв of.05 implies that private school alumni earn about 5% more than public school alumni, conditional on whatever controls were included in the model.

Another important difference between our operational empirical model and the Table 2.1 example is that the former includes many control variables, while the example controls only for the dummy variable A{, indicating students in group A. The key controls in the operational model are a set of many dummy variables indicating all Barron’s matches represented in the sample (with one group left out as a reference category). These controls capture the relative selectivity of the schools to which students applied and were admitted in the real world, where many combinations of schools are possible. The resulting regression model looks like

150

in Yt■ = ft + ppt + YjGROWji + S^SAI) + &2 in PI і – f

;=1 (2Л)

The parameter j8 in this model is still the treatment effect of interest, an estimate of the causal effect of attendance at a private school. But this model controls for 151 groups instead of the two groups in our example. The parameters yp for j = 1 to 150, are the coefficients on 150 selectivity-group dummies, denoted GROUP

It’s worth unpacking the notation in equation (2.2). since we’ll use it again. The dummy variable GROUPji equals 1 when student z is in group j and is 0 otherwise. For example, the first of these dummies, denoted GROUPu, might indicate students who applied and were admitted to three Highly Competitive schools. The second, GROUP2i, might indicate students who applied to two Highly Competitive schools and one Most Competitive school, and were admitted to one of each type. The order in which the categories are coded doesn’t matter as long as we code dummies for all possible combinations, with one group omitted as a reference group. Although we’ve gone from one group dummy to 150, the idea is as before: controlling for the sets of schools to which students applied and were admitted brings us one giant step closer to a ceteris paribus comparison between private and public school students.

A final modification for operational purposes is the addition of two further control variables: individual SAT scores (SAT,) and the log of parental income (PI,), plus a few

variables we’ll relegate to a footnote.- The individual SAT and log parental income controls appear in the model with coefficients 5г and S2 (read as “delta-1” and “delta-2”), respectively. Controls for a direct measure of individual aptitude, like students’ SAT scores, and a measure of family background, like parental income, may help make the public-private comparisons at the heart of our model more apples-to-apples and oranges – to-oranges than they otherwise would be. At the same time, conditional on selectivity – group dummies, such controls may no longer matter, a point explored in detail below.