# Building Models with Logs

The regressions discussed in this chapter look like

a repeat of equation (2.2). What’s up with In Yt on the left-hand side? Why use logs and not the variable Yt itself? The answer is easiest to see in a bivariate regression, say,

In Yf = a + fi Pi + eit

where P; is a dummy for private school attendance. Because this is a case of regression for dummies, we have

pPf.

In other words, regression in this case fits the CEF perfectly.

Suppose we engineer a ceteris paribus change in P; for student z. This reveals potential outcome Yoi when P; = 0 and Yn when P; = 1. Thinking now of equation (2.13) as a model for the log of these potential outcomes, we have

In Уф = or + e.

In Yu = a + & + et.

The difference in potential outcomes is therefore

nYu~lnYfc=fl.

Rearranging further gives

>0, *"о.

* 1п{1 + Д %Ур)

^ A %Ypt

where A%Yp is shorthand for the percentage change in potential outcomes induced by P;. Calculus tells us that ln{l+ A%Yp} is close to Д%Тр, when the latter is small. From this, we conclude that the regression slope in a model with In Yt on the left-hand side gives the approximate percentage change in Yt generated by changing the corresponding regressor.

To calculate the exact percentage change generated by changing P;, exponentiate both sides of equation (2.14)

~ = cxp(0),

y0i

so

When j8 is less than about.2, exp(j8) – 1 and j8 are close enough to justify reference to the latter as percentage change.—

You might hear masters describe regression coefficients from a log-linear model as measuring “log points.” This terminology reminds listeners that the percentage change interpretation is approximate. In general, log points underestimate percentage change, that is,

£ < exp(0) – 1,

with the gap between the two growing as jв increases. For example, when jв = .05, exp(j8) – 1 = .051, but when j8 = .3, exp(j8) – 1 = .35.