Goldfeld Quandt Test for Heteroskedasticity
Using examples from Hill et al. (2011) a model of grouped heteroskedasticity is estimated and a Goldfeld-Quandt test is performed to determine whether the two sample subsets have the same error variance. The error variance associated with the first subset is af and that for the other subset is af.
The null and alternative hypotheses are
22 H0 : ^1 — ^2
Hi : — af
Estimating both subsets separately and obtaining the estimated error variances allow us to construct the following ratio:
F = ~ pdh, df2 (8-3)
a2 Iа 2
where df1 = N1 — K1 from the first subset and df2 = N2 — K2 is from the second subset. Under the null hypothesis that the two variances are equal
f=ai „ Ftlt,
F = a2 Fdfi, df2
This is just the ratio of the estimated variances from the two subset regressions.
Below, I have written a gretl program to reproduce the wage example from Hill et al. (2011) that appears in chapter 8. The example is relatively straightforward and I’ll not explain the script in much detail. It is annotated to help you decipher what each section of the program does.
The example consists of estimating wages as a function of education and experience. In addition, an indicator variable is included that is equal to one if a person lives in a metropolitan area. This is an “intercept” dummy which means that folks living in the metro areas are expected to respond similarly to changes in education and experience (same slopes), but that they earn a premium relative to those in rural areas (different intercept).
Each subset (metro and rural) is estimated separately using least squares and the standard error of the regression is saved for each ($sigma). Generally, you should put the group with the larger variance in the numerator. This allows a one-sided test and also allows you to use the standard p-value calculations as done below.
1 open "@gretldirdatapoecps2.gdt"
2 ols wage const educ exper metro
3 # Use only metro observations
4 smpl metro=1 —restrict
5 ols wage const educ exper
6 scalar stdm = $sigma
7 scalar df_m = $df
8 #Restore the full sample
9 smpl full
10 # Use only rural observations
11 smpl metro=0 —restrict
12 ols wage const educ exper
13 scalar stdr = $sigma
14 scalar df_r = $df
15 # GQ statistic
16 gq = stdm"2/stdr"2
17 scalar pv = pvalue(F, df_m, df_r, gq)
18 printf "nThe F(%d, %d) statistic = %.3f. The right
19 side p-value is %.4g.n",df_m, df_r, gq, pv
The F(805, 189) statistic = 2.088. The right side p-value is 1.567e-009.
Food Expenditure Example
In this example the data are sorted by income (low to high) and the subsets are created using observation numbers. This is accomplished using the GUI. Click Data>Sort data from the main menu bar to reveal the dialog box shown on the right side of Figure 8.6. The large income group is expected to have larger variance so its estimate will be placed in the numerator of the GQ ratio. The script is:
1 open "@gretldirdatapoefood. gdt"
2 dataset sortby income
3 list x = const income
4 # large variance observations
5 smpl 21 40 —restrict
6 ols food_exp x
7 scalar stdL = $sigma
8 scalar df_L = $df
9 #Restore the full sample
10 smpl full
11 # small variance observations
12 smpl 1 20 –restrict
13 ols food_exp x
14 scalar stdS = $sigma
15 scalar df_S = $df
16 # GQ statistic
17 gq = stdL"2/stdS"2
18 scalar pv = pvalue(F, df_m, df_r, gq)
19 printf "nThe F(%d, %d) statistic = %.3f. The right
20 side p-value is %.4g.n",df_m, df_r, gq, pv
The F(18, 18) statistic = 3.615. The right side p-value is 0.004596.
Notice that in line 3 we have used the dataset sortby command in line 2 to sort the data without using the GUI. This allows us to use the smpl 21 40 command to limit the sample to observations 21-40 for the first subset. The other minor improvement is to use the list command in line 3 to specify the list of independent variables. This is useful since the same regression is estimated twice using different subsamples. The homoskedasticity null hypothesis is rejected at the 5% level since the p-value is smaller than 0.05.