Treatment Effects
In order to understand the measurement of treatment effects, consider a simple regression model in which the explanatory variable is a dummy variable, indicating whether a particular individual is in the treatment or control group. Let y be the outcome variable, the measured characteristic
the treatment is designed to affect. Define the indicator variable d as
The effect of the treatment on the outcome can be modeled as
Уі = ві + в2 di + ei i = 1, 2, …,N (7.10)
where ei represents the collection of other factors affecting the outcome. The regression functions for the treatment and control groups are
The treatment effect that we want to measure is в2. The least squares estimator of в2 is
where y 1 is the sample mean for the observations on y for the treatment group and y0 is the sample mean for the observations on y for the untreated group. In this treatment/control framework the estimator b2 is called the difference estimator because it is the difference between the sample means of the treatment and control groups.
To illustrate, we use the data from project STAR described in POE4, chapter 7.5.3.
The first thing to do is to take a look at the descriptive statistics for a subset of the variables. The list v is created to hold the variable names of all the variables of interest. Then the summary command is issued for the variables in v with the —by option. This option takes an argument, which is the name of a discrete variable by which the subsets are determined. Here, small and regular are binary, taking the value of 1 for small classes and 0 otherwise. This will lead to two sets of summary statistics.
1 open "@gretldirdatapoestar. gdt"
2 list v = totalscore small tchexper boy freelunch white_asian
3 tchwhite tchmasters schurban schrural
4 summary v —by=small —simple
5 summary v —by=regular —simple
Here is a partial listing of the output:
regular = 1 (n = 2005):
Mean Minimum Maximum Std. Dev.
918.04 
635.00 
1229.0 
73.138 

small 
0.00000 
0.00000 
0.00000 
0.00000 
tchexper 
9.0683 
0.00000 
24.000 
5.7244 
boy 
0.51322 
0.00000 
1.0000 
0.49995 
freelunch 
0.47382 
0.00000 
1.0000 
0.49944 
white_asian 
0.68130 
0.00000 
1.0000 
0.46609 
tchwhite 
0.79800 
0.00000 
1.0000 
0.40159 
tchmasters 
0.36509 
0.00000 
1.0000 
0.48157 
schurban 
0.30125 
0.00000 
1.0000 
0.45891 
schrural 
0.49975 
0.00000 
1.0000 
0.50012 
small = 1 (n = 
1738): 

Mean 
Minimum 
Maximum 
Std. Dev. 

totalscore 
931.94 
747.00 
1253.0 
76.359 
small 
1.0000 
1.0000 
1.0000 
0.00000 
tchexper 
8.9954 
0.00000 
27.000 
5.7316 
boy 
0.51496 
0.00000 
1.0000 
0.49992 
freelunch 
0.47181 
0.00000 
1.0000 
0.49935 
white_asian 
0.68470 
0.00000 
1.0000 
0.46477 
tchwhite 
0.86249 
0.00000 
1.0000 
0.34449 
tchmasters 
0.31761 
0.00000 
1.0000 
0.46568 
schurban 
0.30610 
0.00000 
1.0000 
0.46100 
schrural 
0.46260 
0.00000 
1.0000 
0.49874 
The —simple option drops the median, C. V., skewness and excess kurtosis from the summary statistics. In this case we don’t need those so the option is used.
Next, we want to drop the observations for those classrooms that have a teacher’s aide and to construct a set of variable lists to be used in the regressions that follow.
1 smpl aide!= 1 —restrict
2 list x1 = const small
3 list x2 = x1 tchexper
4 list x3 = x2 boy freelunch white_asian
5 list x4 = x3 tchwhite tchmasters schurban schrural
In the first line the smpl command is used to limit the sample (—restrict) to those observations for which the aide variable is not equal (!=) to one. The list commands are interesting. Notice that x1 is constructed in a conventional way using list; to the right of the equality is the name of two variables. Then x2 is created with the first elements consisting of the list, x1 followed by the additional variable tchexper. Thus, x2 contains const, small, and tchexper. The lists x3 and x4 are constructed similarly. New variables are appended to previously defined lists. It’s quite seamless and natural.
Now each of the models is estimated with the —quiet option and put into a model table.
OLS estimates
Dependent variable: totalscore
(і) 
(2) 
(3) 
(4) 

const 
918.0** 
907.6** 
927.6** 
936.0** 
(1.667) 
(2.542) 
(3.758) 
(5.057) 

small 
13.90** 
13.98** 
13.87** 
13.36** 
(2.447) 
(2.437) 
(2.338) 
(2.352) 

tchexper 
1.156** 
0.7025** 
0.7814** 

(0.2123) 
(0.2057) 
(0.2129) 

boy 
—15.34** 
—15.29** 

(2.335) 
(2.330) 

freelunch 
—33.79** 
—32.05** 

(2.600) 
(2.666) 

white_asian 
11.65** 
14.99** 

(2.801) 
(3.510) 

tchwhite 
—2.775 

(3.535) 

tchmasters 
—8.180** 

(2.562) 

schurban 
—8.216** 

(3.673) 

schrural 
—9.133** 

(3.210) 

n 
3743 
3743 
3743 
3743 
R2 
0.0083 
0.0158 
0.0945 
0.0988 
г 
2.145e+004 – 
2.144e+004 —2.128e+004 
—2.127e+004 

Standard 
errors in 
parentheses 
* indicates significance at the 10 percent level ** indicates significance at the 5 percent level 
The coefficient on the small indicator variable is not affected by adding or dropping variables from the model. This is indirect evidence that it is not correlated with other regressors. The effect of teacher experience on test scores falls quite a bit when boy, freelunch, and white_asian are added to the equation. This suggests that it is correlated with one or more of these variables and that omitting them from the model leads to biased estimation of the parameters by least squares.
Leave a reply