Regression for Dummies

An important regression special case is bivariate regression with a dummy regressor. The conditional expectation of Yt given a dummy variable, Z{, takes on two values. Write them in Greek, like this:

FAY^Zi = 0| = tf

= = « +

so that

is the difference in expected Yt with the dummy regressor, Zf, switched on and off.

Using this notation, we can write

ВДг|2Л = E[rtZi = 0| + (£[yjz( ^Ц-ElWi И01)г,

= or + ^Zj, (2.8)

This shows that E[Ff|Zf] is a linear function of Zf, with slope j8 and intercept a. Because the CEF with a single dummy variable is linear, regression fits this CEF perfectly. As a result, the regression slope must also be j8 = E[Ff|Zf = 1] – E[Ff|Zf = 0], the difference in expected Yt with Z; switched on and off.

Regression for dummies is important because dummy regressors...

Read More

To Everything There Is a Season (of Birth)

master oogway: Yesterday is history, tomorrow is a mystery, but today is a gift.

That is why it is called the present.

Kung Fu Panda

You get presents on your birthday, but some birth dates are better than others. A birthday that falls near Christmas might reduce your windfall if gift givers try to make one present do double duty. On the other hand, many Americans born late in the year get surprise gifts in the form of higher schooling and higher earnings.

The path leading from late-year births to increased schooling and earnings starts in kindergarten. In most states, children enter kindergarten in the year they turn 5, whether or not they’ve had a fifth birthday by the time school starts in early September...

Read More

One-Stop Shopping with Two-Stage Least Squares

IV estimates of causal effects boil down to reduced-form comparisons across groups defined by the instrument, scaled by the appropriate first stage. This is a universal IV principle, but the details vary across applications. The quantity-quality scenario differs from the KIPP story in that we have more than one instrument for the same underlying causal relation. Assuming that twins and sex-mix instruments both satisfy the required assumptions and capture similar average causal effects, we’d like to combine the two IV estimates they generate to increase statistical precision. At the same time, twinning might be correlated with maternal characteristics like age at birth and ethnicity, leading to bias in twins IV estimates...

Read More

Pairing Off

One sample average is the loneliest number that you’ll ever do. Luckily, we’re usually concerned with two. We’re especially keen to compare averages for subjects in experimental treatment and control groups. We reference these averages with a compact notation, writing Y1 for Avgn[YilDi = 1] and Y0 for Avgn[YilDi = 0]. The treatment group mean, Y1, is the average for the n1 observations belonging to the treatment group, with Y° defined similarly. The total sample size is n = n0 + n1.

For our purposes, the difference between Y1 and Y0 is either an estimate of the causal effect of treatment (if Y is an outcome), or a check on balance (if Y is a covariate). To keep the discussion focused, we’ll assume the former...

Read More

Just DDo It: A Depression Regression

The simplest DD calculation involves only four numbers, as in equations (5.1) and (5.2). In practice, however, the DD recipe is best cooked with regression models fit to samples of more than four data points, such as the 12 points plotted in Figure 5.2. In addition to allowing for more than two periods, regression DD neatly incorporates data on more than two cross-sectional units, as we’ll see in a multistate analysis of the MLDA in Section 5.2. Equally important, regression DD facilitates statistical inference, often a tricky matter in a DD setup (for details, see the appendix to this chapter).

The regression DD recipe associated with Figure 5.2 has three ingredients:

(i) A dummy for the treatment district, written TREATd, where the subscript d reminds us that this varies across distric...

Read More

Regression Anatomy and the OVB Formula

The most interesting regressions are multiple; that is, they include a causal variable of interest, plus one or more control variables. Equation (2.2). for example, regresses log earnings on a dummy for private college attendance in a model that controls for ability, family background, and the selectivity of schools that students have applied to and been admitted to. We’ve argued that control for covariates in a regression model is much like matching. That is, the regression coeffiicent on a private school dummy in a model with controls is similar to what we’d get if we divided students into cells based on these controls, compared public school and private school students within these cells, and then took an average of the resulting set of conditional comparisons...

Read More