Residual Plots

Inadvertently choosing an inappropriate functional form can lead to some serious problems when it comes to using your results for decision-making. There are a number of formal tests that one can do to diagnose problems of specification, but researchers often start by looking at residual plots to get a quick idea if there are any problems.

If the assumptions of the classical normal linear regression model hold (ensuring that least squares is minimum variance unbiased) then residuals should look like those found in ch4sim1.gdt shown in Figure 4.9 below.

open "@gretldirdatapoech4sim1.gdt" gnuplot e x


Figure 4.9: Random residuals from ch4sim1.gdt

If there is no apparent pattern, then chances are the assumptions required for the Gauss-Markov theorem to hold may be satisfied and the least squares estimator will be efficient among linear estimators and have the usual desirable properties.

The next plot is of the least squares residuals from the linear-log food expenditure model (Figure 4.10). These do not appear to be strictly random. Rather, they are heteroskedastic, which means that for some levels of income, food expenditure varies more than for others (more variance for high incomes). Least squares may be unbiased in this case, but it is not efficient. The validity of hypothesis tests and intervals is affected and some care must be taken to ensure proper statistical inferences are made. This is discussed at more length in chapter 8.

Finally, the ch4sim2.gdt dataset contains least squares residuals from a linear regression fit to quadratic data. To treat the relationship as linear would be like trying to fit a line through a parabola! This appears in Figure 4.11. The script to generate this is:

1 open "@gretldirdatapoech4sim2.gdt"

2 ols y const x

3 series ehat = $uhat

4 gnuplot ehat x

Notice that another accessor has been used to store the residuals into a new variable. The residuals from the preceding regression are stored and can be accessed via $uhat. In line 3 these were


Figure 4.10: Heteroskedastic residuals from the linear-log model of food expenditures.

accessed and assigned to the variable ehat. Then, they can be plotted using gnuplot.

Looking at the plot in Figure 4.11, there is an obvious problem with model specification. The errors are supposed to look like a random scatter around zero. There are clearly parabolic and the model is NOT correctly specified.

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>