Prediction Interval

To generate a complete confidence interval for every year of schooling between 1 and 21 years, you can use the following script. The result looks very similar to Figure 4.15 in POE4.

1 open "@gretldirdatapoecps4_small. gdt"

2 logs wage

3 ols l_wage const educ

4 scalar sig2 = \$ess/\$df

5 matrix sem = zeros(21,5)

6 loop for i = 1..21 —quiet

7 scalar yh = (\$coeff(const) + \$coeff(educ)*i)

8 scalar f = sig2 + sig2/\$nobs + ((i-mean(educ))"2)*(\$stderr(educ)"2)

9 sem[i,1]=i

10 sem[i,2]= yh

11 sem[i,3]=sqrt(f)

12 sem[i,4]=exp(yh-critical(t,\$df,0.025)*sqrt(f))

13 sem[i,5]=exp(yh+critical(t,\$df,.025)*sqrt(f))

14 endloop

15 print sem

16

16 nulldata 21 —preserve

17 series ed = sem[,1]

18 series wage = exp(sem[,2])

19 series lb = sem[,4]

20 series ub = sem[,5]

Although there are probably more elegant ways to do this, this script works. It will take a bit of explanation, however. In lines 1-4 the dataset is opened, log wage is created, the regression is estimated as is the overall variance of the model.

In line 5 a matrix of zeros is created that will be used to store results created in a loop. The loop starts at i=1 and iterates, by one, to 21. These are the possible years of schooling that individuals have in our dataset. For each number of years the forecast and its forecast variance are estimated (lines 7 and 8). Notice that these will have different values at each iteration of the loop thanks to their dependence on the index, i. In line 9 the matrix sem gets i placed on the ith row of the first column. The next line puts the prediction in the second column. In the third column I’ve placed the forecast standard error and in the next two the lower and upper boundaries for the interval. The loop ends at i=21, at which point the matrix sem is full; then it is printed.

Although you can plot the columns of matrices, I find it easier to put the columns into a dataset and use the regular gretl commands to make plots. First, create an empty dataset using nulldata 21. The 21 puts 21 observations into the dataset. The —preserve option is required because without it the contents of the matrix sem would be emptied-definitely not what we want. In the next lines the series command is used to put each column of the matrix into a data series. Once this is done, the variables will show up in the data window and you can graph them as usual. Below in Figure 4.19 is the graph that I created (with a little editing).