Grouped Data
In our discussion of the GoldfeldQuandt test we decided that wages in rural and metropolitan areas showed different amounts of variation. When the heteroskedasticity occurs between groups, it is relatively straightforward to estimate the GLS correctionsthis is referred to as Feasible GLS (FGLS).
There are a couple of ways to estimate each subsample. The first was used in the Goldfeld – Quandt test example where the metro subsample was chosen using smpl metro=1 —restrict and the rural one chosen with smpl metro=0 —restrict. Grouped GLS using this method can be found below:
1 open "@gretldirdatapoecps2.gdt"
2 list x = const educ exper
3 ols wage x metro
4 smpl metro —dummy
5 ols wage x
6 scalar stdm = $sigma
7 smpl full
8 series rural = 1metro
9 smpl rural —dummy
10 ols wage x
11 scalar stdr = $sigma
12 #Restore the full sample
13 smpl full
14 series wm = metro*stdm
15 series wr = rural*stdr
16 series w = 1/(wm + wr)"2
17 wls w wage x metro
The smpl command is used in a new way here. In line 3 smpl metro —dummy restricts the sample based on the indicator variable metro. The sample will be restricted to only those observations for which metro=1. The wage equation is estimated in line 4 for the metro dwellers and the standard error of the regression is saved in line 5.
The next lines restore the full sample and create a new indicator variable for rural dwellers. Its value is just 1metro. We generate this in order to use the smpl rural —dummy syntax. We could have skipped generating the rural and simply used smpl metro=0 —restrict. In line 10 the model is estimated for rural dwellers and the standard error of the regression is saved.
The full sample must be restored and two sets of weights are going to be created and combined. In line 14 the statement series wm = metro*stdm multiplies the metro S. E. of the regression times the indicator variable. Its values will either be stdm for metro dwellers and 0 for rural dwellers. We do the same for rural dwellers in 15. Adding these two series together creates a single variable that contains only two distinct values, dM for metro dwellers and dR for rural ones. Squaring this and taking the reciprocal provides the necessary weights for the weighted least squares regression.
WLS, using observations 11000
Dependent variable: wage
Coefficient 
Std. Error 
tratio 
pvalue 

const 
9.39836 
1.01967 
9.2170 
0.0000 
educ 
1.19572 
0.0685080 
17.4537 
0.0000 
exper 
0.132209 
0.0145485 
9.0874 
0.0000 
metro 
1.53880 
0.346286 
4.4437 
0.0000 
Statistics based on the weighted data:

Statistics based on the original data:
Mean dependent var 10.21302 S. D. dependent var 6.246641 Sum squared resid 28585.82 S. E. of regression 5.357296
Leave a reply