# Weighted Least Squares

If you know something about the structure of the heteroskedasticity, you may be able to get more precise estimates using a generalization of least squares. In heteroskedastic models, observations that are observed with high variance don’t contain as much information about the location of the regression line as those observations having low variance. The basic idea of generalized least squares in this context is to reweigh the data so that all the observations contain the same level of information (i. e., same variance) about the location of the regression line. So, observations that contain more noise are given small weights and those containing more signal a higher weight. Reweighing the data in this way is known in some statistical disciplines as weighted least squares. This descriptive term is the one used by gretl as well.

Suppose that the errors vary proportionally with xi according to

var(ei) = o2xi (8.5)

The errors are heteroskedastic since each error will have a different variance, the value of which depends on the level of xi. Weighted least squares reweighs the observations in the model so that each transformed observation has the same variance as all the others. Simple algebra reveals that

——= var(ei) = a2 (8.6)

xi

So, multiply equation (8.1) by 1/—xi to complete the transformation. The transformed model is homoskedastic and least squares and the least squares standard errors are statistically valid and efficient.

Gretl makes this easy since it contains a function to reweigh all the observations according to a weight you specify. The command is wls, which naturally stands for weighted least squares! The only thing you need to be careful of is how gretl handles the weights. Gretl takes the square root of the value you provide. That is, to reweigh the variables using 1/ —xi you need to use its square 1/xi as the weight. Gretl takes the square root of w for you. To me, this is a bit confusing, so you may want to verify what gretl is doing by manually transforming y, x, and the constant and running the regression. The script file shown below does this.

In the example, you first have to create the weight, then call the function wls. The script appears below.

open "@gretldirdatapoefood. gdt"

#GLS using built in function

series w = 1/income

wls w food_exp const income

scalar lb = \$coeff(income) – critical(t,\$df,0.025) * \$stderr(income)

scalar ub = \$coeff(income) + critical(t,\$df,0.025) * \$stderr(income)

printf "nThe 95%% confidence interval is (%.3f, %.3f).n",lb, ub

#GLS using OLS on transformed data

series wi = 1/sqrt(income)

series ys = wi*food_exp

series xs = wi*x

series cs = wi

ols ys cs xs

The first argument after wls is the name of the weight variable. Then, specify the regression to which it is applied. Gretl multiplies each variable (including the constant) by the square root of the given weight and estimates the regression using least squares.

In the next block of the program, wi = 1/yX is created and used to transform the dependent variable, x and the constant. Least squares regression using this manually weighted data yields the same results as you get with gretl’s wls command. In either case, you interpret the output of weighted least squares in the usual way.

The weighted least squares estimation yields:

Model 6: WLS, using observations 1-40
Dependent variable: food_exp
Variable used as weight: w  Std. Error t-ratio p-value

23.7887 3.3076 0.0021

1.38589 7.5410 0.0000

Statistics based on the weighted data:

 Sum squared resid 13359.5 S. E. of regression 18.7501 R2 0.599438 Adjusted R2 0.588897 F(1, 38) 56.8667 P-value(F) 4.61e-09 Log-likelihood -172.98 Akaike criterion 349.959 Schwarz criterion 353.337 Hannan-Quinn 351.18

Statistics based on the original data:

Mean dependent var 283.5735 S. D. dependent var 112.6752 Sum squared resid 304611.7 S. E. of regression 89.53266

and the 95% confidence interval for the slope в2 is (7.645, 13.257).