# Goldfeld Quandt Test for Heteroskedasticity

Using examples from Hill et al. (2011) a model of grouped heteroskedasticity is estimated and a Goldfeld-Quandt test is performed to determine whether the two sample subsets have the same error variance. The error variance associated with the first subset is af and that for the other subset is af.

The null and alternative hypotheses are

22 H0 : ^1 — ^2

Hi : — af

Estimating both subsets separately and obtaining the estimated error variances allow us to construct the following ratio:

F = ~ pdh, df2 (8-3)

a2 Iа 2

where df1 = N1 — K1 from the first subset and df2 = N2 — K2 is from the second subset. Under the null hypothesis that the two variances are equal f=ai „ Ftlt,

F = a2 Fdfi, df2

а

This is just the ratio of the estimated variances from the two subset regressions.

Wage Example

Below, I have written a gretl program to reproduce the wage example from Hill et al. (2011) that appears in chapter 8. The example is relatively straightforward and I’ll not explain the script in much detail. It is annotated to help you decipher what each section of the program does.

The example consists of estimating wages as a function of education and experience. In addition, an indicator variable is included that is equal to one if a person lives in a metropolitan area. This is an “intercept” dummy which means that folks living in the metro areas are expected to respond similarly to changes in education and experience (same slopes), but that they earn a premium relative to those in rural areas (different intercept).

Each subset (metro and rural) is estimated separately using least squares and the standard error of the regression is saved for each (\$sigma). Generally, you should put the group with the larger variance in the numerator. This allows a one-sided test and also allows you to use the standard p-value calculations as done below.

1 open "@gretldirdatapoecps2.gdt"

2 ols wage const educ exper metro

3 # Use only metro observations

4 smpl metro=1 —restrict

5 ols wage const educ exper

6 scalar stdm = \$sigma

7 scalar df_m = \$df

8 #Restore the full sample

9 smpl full

10 # Use only rural observations

11 smpl metro=0 —restrict

12 ols wage const educ exper

13 scalar stdr = \$sigma

14 scalar df_r = \$df

15 # GQ statistic

16 gq = stdm"2/stdr"2

17 scalar pv = pvalue(F, df_m, df_r, gq)

18 printf "nThe F(%d, %d) statistic = %.3f. The right

19 side p-value is %.4g.n",df_m, df_r, gq, pv

which produces

The F(805, 189) statistic = 2.088. The right side p-value is 1.567e-009.

Food Expenditure Example

In this example the data are sorted by income (low to high) and the subsets are created using observation numbers. This is accomplished using the GUI. Click Data>Sort data from the main menu bar to reveal the dialog box shown on the right side of Figure 8.6. The large income group is expected to have larger variance so its estimate will be placed in the numerator of the GQ ratio. The script is:

1 open "@gretldirdatapoefood. gdt"

2 dataset sortby income

3 list x = const income

4 # large variance observations

5 smpl 21 40 —restrict

6 ols food_exp x

7 scalar stdL = \$sigma

8 scalar df_L = \$df

9 #Restore the full sample

10 smpl full

11 # small variance observations

12 smpl 1 20 –restrict

13 ols food_exp x

14 scalar stdS = \$sigma

15 scalar df_S = \$df

16 # GQ statistic

17 gq = stdL"2/stdS"2

18 scalar pv = pvalue(F, df_m, df_r, gq)

19 printf "nThe F(%d, %d) statistic = %.3f. The right

20 side p-value is %.4g.n",df_m, df_r, gq, pv

This yields:

The F(18, 18) statistic = 3.615. The right side p-value is 0.004596.

Notice that in line 3 we have used the dataset sortby command in line 2 to sort the data without using the GUI. This allows us to use the smpl 21 40 command to limit the sample to observations 21-40 for the first subset. The other minor improvement is to use the list command in line 3 to specify the list of independent variables. This is useful since the same regression is estimated twice using different subsamples. The homoskedasticity null hypothesis is rejected at the 5% level since the p-value is smaller than 0.05.