A gretl Function to Produce Model Selection Rules

Gretl offers a mechanism for defining functions, which may be called via the command line, in the context of a script, or (if packaged appropriately via the programs graphical interface. The syntax for defining a function looks like this:

function return-type function-name (parameters) function body end function

The opening line of a function definition contains these elements, in strict order:

1. The keyword function.

2. return-type, which states the type of value returned by the function, if any. This must be one of void (if the function does not return anything), scalar, series, matrix, list or string.

3. function-name, the unique identifier for the function. Names must start with a letter. They have a maximum length of 31 characters; if you type a longer name it will be truncated. Function names cannot contain spaces. You will get an error if you try to define a function having the same name as an existing gretl command. Also, be careful not to give any of your variables (scalars, matrices, etc.) the same name as one of your functions.

4. The functionss parameters, in the form of a comma-separated list enclosed in parentheses. This may be run into the function name, or separated by white space as shown.

The model selection function is designed to do two things. First, we want it to print values of the model selection rules for R2, AIC and SC. While we are at it we should also print how many regressors the model has (and their names) and the sample size. The second thing we want is to be able to send the computed statistics to a matrix. This will allow us to collect results from several candidates into a single table.

The basic structure of the model selection function is

function matrix modelsel (series y, list xvars) [some computations]

[print results]

[return results] end function

As required, it starts with the keyword function. The next word, matrix, tells the function that a matrix will be returned as output. The next word is modelsel, which is the name that we are giving to our function. The modelsel function has two arguments that will be used as inputs. The first is a data series that we will refer to inside the body of the function as y. The second is a list that will be referred to as xvars. The inputs are separated by a comma and there are spaces between the list of inputs. Essentially what we are going to do is feed the function a dependent variable and a list of the independent variables as inputs. Inside the function a regression is estimated, the criteria are computed based on it, the statistics are printed to the screen, and collected into a matrix that will be returned. The resulting matrix is then available for further manipulation outside of the function.

1 function matrix modelsel (series y, list xvars)

2 ols y xvars —quiet

3 scalar sse = $ess

4 scalar N = $nobs

5 scalar K = nelem(xvars)

6 scalar aic = ln(sse/N)+2*K/N

7 scalar bic = ln(sse/N)+K*N/N

8 scalar rbar2 = 1-((1-$rsq)*(N-1)/$df)

9 matrix A = { K, N, aic, bic, rbar2 }

10 printf "nRegressors: %sn",varname(xvars)

11 printf "K = %d, N = %d, AIC = %.4f, SC = %.4f, and

12 Adjusted R2 = %.4fn", K, N, aic, bic, rbar2

13 return A

14 end function

In line 2 the function inputs y and the list xvars are used to estimate a linear model by least squares. The —quiet option is used to suppress the least squares output. In lines 3-5 the sum of squared errors, SSE, the number of observations, N, and the number of regressors, K, are put into scalars. In lines 6-8 the three criteria are computed. Line 9 puts various scalars into a matrix called A. Lines 10 and 11 sends the names of the regressors to the screen. Line 11 sends formatted output to the screen. Line 12 sends the matrix A as a return from the function. The last line closes the function.[26]

At this point, the function can be highlighted and run.

To use the function create a list that will include the desired independent variables (called x in this case). Then to use the function you will create a matrix called a that will include the output from modelsel.

1 list x = const he we xtra_x5 xtra_x6

2 matrix a = modelsel(faminc, x)

The output is:

Regressors: const, he, we, kl6,xtra_x5,xtra_x6

K = 6, N = 428, AIC = 21.2191, SC = 27.1911, and Adjusted R2 = 0.1681

You can see that each of the regressor names is printed out on the first line of output. This is followed by the values of K, N, AIC, SC, and R2.

To put the function to use, consider the following script where we create four sets of variables and use the model selection rules to pick the desired model.

1 list x1 = const he

2 list x2 = const he we

3 list x3 = const he we kl6

4 list x4 = const he we xtra_x5 xtra_x6

5 matrix a = modelsel(faminc, x1)

6 matrix b = modelsel(faminc, x2)

7 matrix c = modelsel(faminc, x3)

8 matrix d = modelsel(faminc, x4)

9 matrix MS = a|b|c|d

10 colnames(MS,"K N AIC SC Adj_R2" )

11 printf "%10.5g",MS

12 function modelsel clear

In this example the model selection rules will be computed for four different models. Lines 1-4 construct the variable list for each of these. The next four lines run the model selection function for each set of variables. Each set of results is saved in a separate matrix (a, b, c, d). The colnames function is used to give each column of the matrix a meaningful name. Then, the printf statement prints the matrix. The last line removes the modelsel function from memory. This is not strictly necessary. If you make changes to your function, just recompile it. The biggest problem with function proliferation is that you may inadvertently try to give a variable the same name as one of your functions that is already in memory. If that occurs, clear the function or rename the variable.

The first part of the output prints the results from the individual calls to modelsel.

Regressors: const, he

K = 2, N = 428, AIC = 21.2618, SC = 21.2807, and Adjusted R2 = 0.1237 Regressors: const, he, we

K = 3, N = 428, AIC = 21.2250, SC = 21.2534, and Adjusted R2 = 0.1574 Regressors: const, he, we, kl6

K = 4, N = 428, AIC = 21.2106, SC = 21.2485, and Adjusted R2 = 0.1714

Regressors: const, he, we, xtra_x5,xtra_x6

K = 5, N = 428, AIC = 21.2331, SC = 21.2805, and Adjusted R2 = 0.1544

The last part prints the matrix MS.

K

N

AIC

SC

Adj_R2

2

428

21.262

21.281

0.12375

3

428

21.225

21.253

0.15735

4

428

21.211

21.248

0.17135

5

428

21.233

21.281

0.15443

In this example all three criteria select the same model: K = 4 and the regressors are const, he, we, kl6. This model minimized AIC and SC and maximizes the adjusted R2.

Later in the book, this model selection function will be refined to make it more general.

6.5.1 RESET

The RESET test is used to assess the adequacy of your functional form. The null hypothesis is that your functional form is adequate. The alternative is that it is not. The test involves running a couple of regressions and computing an F-statistic.

Consider the model

Уі = ві + @2Xi2 + взХіз + ei (6.11)

and the hypothesis

Ho : E[yXi2, Жіз] = ві + в2Хі2 + взХіз

H1 : not H0

Rejection of H0 implies that the functional form is not supported by the data. To test this, first estimate (6.11) using least squares and save the predicted values, yi. Then square and cube y and add them back to the model as shown below:

Уі = ві + в2 Xi2 + взXiз + Yiy2 + ei

Уі = ві + в2 Xi2 + взХіз + Yiy2 + Y2y| + ei

The null hypotheses to test (against alternative, ‘not H0’) are:

Ho : Yi = 0

Ho : Yi = Y2 = 0

Estimate the auxiliary models using least squares and test the significance of the parameters of y2 and/or Уз. This is accomplished through the following script. Note, the reset command issued after the first regression computes the test associated with H0 : 71 = 72 = 0. It is included here so that you can compare the ‘canned’ result with the one you compute using the two step procedure suggested above. The two results should match.

1 ols famine x3 —quiet

2 reset —quiet

3 reset —quiet —squares-only

The results of the RESET for the family income equation is

RESET test for specification (squares and cubes)

Test statistic: F = 3.122581,

with p-value = P(F(2,422) > 3.12258) = 0.0451

RESET test for specification (squares only)

Test statistic: F = 5.690471,

with p-value = P(F(1,423) > 5.69047) = 0.0175

The adequacy of the functional form is rejected at the 5% level for both tests. It’s back to the drawing board!

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>