# Differences-in-Differences Estimation

If you want to learn about how a change in policy affects outcomes, nothing beats a randomized controlled experiment. Unfortunately, these are rare in economics because they are either very expensive of morally unacceptable. No one want to determines what the return to schooling is by randomly assigning people to a prescribed number of schooling years. That choice should be yours and not someone else’s.

But, the evaluation of policy is not hopeless when randomized controlled experiments are impossible. Life provides us with situations that happen to different groups of individuals at different points in time. Such events are not really random, but from a statistical point of view the treatment may appear to be randomly assigned. That is what so-called natural experiments are about. You have two groups of similar people. For whatever reason, one group gets treated to the policy and the other does not. Comparative differences are attributed to the policy.

In the example, we will look at the effects of a change in the minimum wage. It is made possible because the minimum wage is raised in one state and not another. The similarity of states is important because the non-treated state is going to be used for comparison.

The data come from Card and Krueger and are found in the file njminS. gdt. We will open it and look at the summary statistics by state.

*1 *open "@gretldirdatapoenjmin3.gdt"

*2 *smpl d = 0 —restrict

*3 *summary fte —by=nj —simple

*4 *smpl full

*5 *smpl d = 1 —restrict

*6 *summary fte —by=nj —simple

*7 *smpl full

Since we want to get a picture of what happened in NJ and PA before and after NJ raised the minimum wage we restrict the sample to before the increase. Then get the summary statistics for fte by state in line 3. Restore the full sample and then restrict it to after the policy d=1. Repeat the summary statistics for fte. The results suggest not much difference at this point.

nj = |
0 |
(n = |
79) d=0: |
|||

Mean |
Minimum |
Maximum |
Std. Dev. |
|||

fte |
23.331 |
7.5000 |
70.500 |
11.856 |
||

nj = |
1 |
(n = |
331) d=0: |
|||

Mean |
Minimum |
Maximum |
Std. Dev. |
|||

fte |
20.439 |
5.0000 |
85.000 |
9.1062 |
||

nj = |
0 |
(n = |
79) d=1: |
|||

Mean |
Minimum |
Maximum |
Std. Dev. |
|||

fte |
21.166 |
0.00000 |
43.500 |
8.2767 |
||

nj = |
1 |
(n = |
331) d=1: |
|||

Mean |
Minimum |
Maximum |
Std. Dev. |
|||

fte |
21.027 |
0.00000 |
60.500 |
9.2930 |

Now, make some variable list and run a few regressions [53] 2 [54] [55]

5 ols fte ХІ

6 modeltab add

7 ols fte x2

8 modeltab add

9 ols fte x3

10 modeltab add

11 modeltab show

The first set of variables include the indicator variables nj, d and their interaction. The second set adds more indicators for whether the jobs are at kfc, roys, or wendys and if the store is companied owned. The final set add more indicators for location.

The results from the three regressions appear below:

OLS estimates

Dependent variable: fte

Standard errors in parentheses * indicates significance at the 10 percent level ** indicates significance at the 5 percent level

In the previous analysis we did not exploit an important feature of Card and Krueger’s data. The same restaurants were observed before and after in both states-in 384 of the 410 observations. It seems reasonable to limit the before and after comparison to the same units.

This requires adding an individual fixed effect to the model and dropping observations that have no before or after with which to compare.

1 smpl missing(demp) != 1 —restrict

2 ols demp const nj

Fortunately, the data set includes the AFTE where it is called demp. Dropping the observations for demp that are missing and using least squares to estimate the parameters of the simple regression yield:

demp = -2.28333 + 2.75000 nj

(0.73126) (0.81519)

T = 768 R**[56]** **[57]** **[58]** **[59]** **[60]** = 0.0134 F(1, 766) = 11.380 a = 8.9560

(standard errors in parentheses)

6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |

# obtain summary statistics for full sample smpl full

summary

# create indicator variable for large homes series ld = (sqft>25)

discrete ld smpl 1 8

print ld sqft —byobs smpl full

# create interaction and estimate model series sqft_utown=sqft*utown

ols price const utown sqft sqft_utown age pool fplace

# generate some marginal effects scalar premium = $coeff(utown)*1000

scalar sq_u = 10*($coeff(sqft)+$coeff(sqft_utown))

scalar sq_other = 10*$coeff(sqft)

scalar depr = 1000*$coeff(age)

scalar sp = 1000*$coeff(pool)

scalar firep = 1000*$coeff(fplace)

printf "n University Premium = $%8.7gn

Marginal effect of sqft near University = $%7.6gn

Marginal effect of sqft elsewhere = $%7.6gn

Depreciation Rate = $%7.2fn

Pool = $%7.2fn

Fireplace = $%7.2fn",premium, sq_u, sq_other, depr, sp, firep omit sqft_utown

# testing joint hypotheses

open "@gretldirdatapoecps4_small. gdt" series blk_fem = black*female ols wage const educ black female blk_fem restrict b[3]=0 b[4]=0 b[5]=0 end restrict

ols wage const educ black female blk_fem south midwest west omit south midwest west scalar sser = $ess

# creation of interactions using a loop list x = const educ black female blk_fem list dx = null

loop foreach i x

series south_$i = south * $i list dx = dx south_$i endloop

57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |

modeltab clear ols wage x dx scalar sseu = $ess scalar dfu = $df modeltab add

# estimating subsets smpl south=1 —restrict ols wage x

modeltab add smpl full

smpl south=0 —restrict ols wage x modeltab add modeltab show

# Chow tests smpl full ols wage x scalar sser = $ess

scalar fstat = ((sser-sseu)/5)/(sseu/dfu) pvalue f 5 dfu fstat

ols wage x

chow south —dummy

# log-linear model–interpretation

open "@gretldirdatapoecps4_small. gdt" logs wage

ols l_wage const educ female

scalar differential = 100*(exp($coeff(female))-1)

# linear probability model with HCCME open "@gretldirdatapoecoke. gdt"

ols coke const pratio disp_coke disp_pepsi —robust

# treatment effects

open "@gretldirdatapoestar. gdt"

list v = totalscore small tchexper boy freelunch

white_asian tchwhite tchmasters schurban schrural summary v —by=small —simple summary v –by=regular –simple

smpl aide!= 1 –restrict list x1 = const small list x2 = x1 tchexper

list x3 = x1 boy freelunch white_asian

list x4 = x1 tchwhite tchmasters schurban schrural

ols totalscore x1 –quiet

108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |

modeltab add

ols totalscore x2 —quiet modeltab add

ols totalscore x3 —quiet modeltab add

ols totalscore x4 —quiet modeltab add modeltab show modeltab free

# manual creation of multiple indicators for school id discrete schid

list d = dummify(schid) ols totalscore x1 —quiet scalar sser = $ess scalar r_df = $df modeltab add

ols totalscore x2 –quiet modeltab add

ols totalscore x1 d –quiet scalar sseu = $ess scalar u_df = $df modeltab add

ols totalscore x2 d –quiet modeltab add modeltab show modeltab free

scalar J = r_df-u_df

scalar fstat = ((sser – sseu)/J)/(sseu/u_df) pvalue f J u_df fstat

# testing random assignment of students

ols small const boy white_asian tchexper freelunch restrict

b[1] = .5 end restrict

# differences-in-differences

open "@gretldirdatapoenjmin3.gdt" smpl d = 0 —restrict summary fte –by=nj –simple smpl full

smpl d = 1 –restrict summary fte –by=nj –simple smpl full

list x1 = const nj d d_nj

list x2 = x1 kfc roys wendys co_owned

list x3 = x2 southj centralj pa1

summary x1 fte

ols fte x1 modeltab add ols fte x2 modeltab add ols fte x3 modeltab add modeltab show modeltab free

159 160 161 162 163 164 165 166 167 168 169 170 171 |

smpl missing(demp) != ols demp const nj

## Leave a reply