Grouped Data

In our discussion of the Goldfeld-Quandt test we decided that wages in rural and metropolitan areas showed different amounts of variation. When the heteroskedasticity occurs between groups, it is relatively straightforward to estimate the GLS corrections-this is referred to as Feasible GLS (FGLS).

The example consists of estimating wages as a function of education and experience and is based on the cps2.gdt used in the Goldfeld-Quandt test example. The strategy for combining these partitions and estimating the parameters using generalized least squares is fairly simple. Each subsample will be used to estimate the model and the standard error of the regression, d (using the accessor $sigma) will be saved. Then each subsample is weighted by the reciprocal of its estimated variance (which is the squared value of the 1/d2.

There are a couple of ways to estimate each subsample. The first was used in the Goldfeld – Quandt test example where the metro subsample was chosen using smpl metro=1 —restrict and the rural one chosen with smpl metro=0 —restrict. Grouped GLS using this method can be found below:

1 open "@gretldirdatapoecps2.gdt"

2 list x = const educ exper

3 ols wage x metro

4 smpl metro —dummy

5 ols wage x

6 scalar stdm = $sigma

7 smpl full

8 series rural = 1-metro

9 smpl rural —dummy

10 ols wage x

11 scalar stdr = $sigma

12 #Restore the full sample

13 smpl full

14 series wm = metro*stdm

15 series wr = rural*stdr

16 series w = 1/(wm + wr)"2

17 wls w wage x metro

The smpl command is used in a new way here. In line 3 smpl metro —dummy restricts the sample based on the indicator variable metro. The sample will be restricted to only those observations for which metro=1. The wage equation is estimated in line 4 for the metro dwellers and the standard error of the regression is saved in line 5.

The next lines restore the full sample and create a new indicator variable for rural dwellers. Its value is just 1-metro. We generate this in order to use the smpl rural —dummy syntax. We could have skipped generating the rural and simply used smpl metro=0 —restrict. In line 10 the model is estimated for rural dwellers and the standard error of the regression is saved.

The full sample must be restored and two sets of weights are going to be created and combined. In line 14 the statement series wm = metro*stdm multiplies the metro S. E. of the regression times the indicator variable. Its values will either be stdm for metro dwellers and 0 for rural dwellers. We do the same for rural dwellers in 15. Adding these two series together creates a single variable that contains only two distinct values, dM for metro dwellers and dR for rural ones. Squaring this and taking the reciprocal provides the necessary weights for the weighted least squares regression.

WLS, using observations 1-1000
Dependent variable: wage

Coefficient

Std. Error

t-ratio

p-value

const

-9.39836

1.01967

-9.2170

0.0000

educ

1.19572

0.0685080

17.4537

0.0000

exper

0.132209

0.0145485

9.0874

0.0000

metro

1.53880

0.346286

4.4437

0.0000

Statistics based on the weighted data:

Sum squared resid

998.4248

S. E. of regression

1.001217

R2

0.271528

Adjusted R2

0.269334

F (3,996)

123.7486

P-value(F)

3.99e-68

Log-likelihood

-1418.150

Akaike criterion

2844.301

Schwarz criterion

2863.932

Hannan-Quinn

2851.762

Statistics based on the original data:

Mean dependent var 10.21302 S. D. dependent var 6.246641 Sum squared resid 28585.82 S. E. of regression 5.357296

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>