# Cars Example

The data set cars. gdt is included in package of datasets that are distributed with this manual. In most cases it is a good idea to print summary statistics of any new dataset that you work with. This serves several purposes. First, if there is some problem with the dataset, the summary statistics may give you some indication. Is the sample size as expected? Are the means, minimums and maximums reasonable? If not, you’ll need to do some investigative work. The other reason is important as well. By looking at the summary statistics you’ll gain an idea of how the variables have been scaled. This is vitally important when it comes to making economic sense out of the results. Do the magnitudes of the coefficients make sense? It also puts you on the lookout for discrete variables, which also require some care in interpreting.

The summary command is used to get summary statistics. These include mean, minimum, maximum, standard deviation, the coefficient of variation, skewness and excess kurtosis. The corr command computes the simple correlations among your variables. These can be helpful in gaining an initial understanding of whether variables are highly collinear or not. Other measures are more useful, but it never hurts to look at the correlations. Either of these commands can be used with a variable list afterwards to limit the list of variables summarized of correlated.

Consider the cars example from POE4. The script is

1 open "c:Program Filesgretldatapoecars. gdt"

2 summary

3 corr

4 ols mpg const cyl eng wgt

5 vif

The summary statistics appear below:

 Summary Statistics, using the observations 1-392 Variable Mean Median Minimum Maximum mpg 23.4459 22.7500 9.00000 46.6000 cyl 5.47194 4.00000 3.00000 8.00000 eng 194.412 151.000 68.0000 455.000 wgt 2977.58 2803.50 1613.00 5140.00 Variable Std. Dev. C. V. Skewness Ex. kurtosis mpg 7.80501 0.332894 0.455341 -0.524703 cyl 1.70578 0.311733 0.506163 -1.39570 eng 104.644 0.538259 0.698981 -0.783692 wgt 849.403 0.285266 0.517595 -0.814241

and the correlation matrix

Correlation coefficients, using the observations 1-392
5% critical value (two-tailed) = 0.0991 for n = 392

 mpg cyl eng wgt 1.0000 -0.7776 -0.8051 -0.8322 mpg 1.0000 0.9508 0.8975 cyl 1.0000 0.9330 eng 1.0000 wgt

The variables are quite highly correlated in the sample. For instance the correlation between weight and engine displacement is 0.933. Cars with big engines are heavy. What a surprise!

The regression results are:

OLS, using observations 1-392
Dependent variable: mpg

 Coefficient Std. Error t-ratio p-value const 44.3710 1.48069 29.9665 0.0000 cyl -0.267797 0.413067 -0.6483 0.5172 eng -0.0126740 0.00825007 -1.5362 0.1253 wgt -0.00570788 0.000713919 -7.9951 0.0000

The test of the individual significance of cyl and eng can be read from the table of regression results. Neither are significant at the 5% level. The joint test of their significance is performed using the omit statement. The F-statistic is 4.298 and has a p-value of 0.0142. The null hypothesis is rejected in favor of their joint significance.

The new statement that requires explanation is vif. vif stands for variance inflation factor and it is used as a collinearity diagnostic by many programs, including gretl. The vif is closely related to the statistic suggested by Hill et al. (2011) who suggest using the R from auxiliary regressions to determine the extent to which each explanatory variable can be explained as linear functions of the others. They suggest regressing xj on all of the other independent variables and comparing the R from this auxiliary regression to 10. If the R2 exceeds 10, then there is evidence of a collinearity problem.

The vifj actually reports the same information, but in a less straightforward way. The vif associated with the jth regressor is computed

vifj = r-R2 (*)

which is, as you can see, simply a function of the Rj2 from the jth regressor. Notice that when R2 > .80, the vifj > 10. Thus, the rule of thumb for the two rules is actually the same. A vifj greater than 10 is equivalent to an R2 greater than.8 from the auxiliary regression.

The output from gretl is shown below:

Variance Inflation Factors Minimum possible value = 1.0

Values > 10.0 may indicate a collinearity problem

 Mean dependent var 23.44592 Sum squared resid 7162.549 R 0.699293 F(3, 388) 300.7635 Log-likelihood -1125.674 Schwarz criterion 2275.234

 S. E. of regression 4.296531 Adjusted R 0.696967 P-value(F) 7.6e-101 Akaike criterion 2259.349 Hannan-Quinn 2265.644 cyl 10.516 eng 15.786

VIF(j) = 1/(1 – R(j)"2), where R(j) is the multiple correlation coefficient between variable j and the other independent variables

Properties of matrix X’X:

1-norm = 4.0249836e+009

Determinant = 6.6348526e+018

Reciprocal condition number = 1.7766482e-009

Once again, the gretl output is very informative. It gives you the threshold for high collinearity (vifj) > 10) and the relationship between vifj and R2. Clearly, these data are highly collinear. Two variance inflation factors above the threshold and the one associated with wgt is fairly large as well.

The variance inflation factors can be produced from the dialogs as well. Estimate your model then, in the model window, select Tests>Collinearity and the results will appear in gretl’s output.

 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76

ols sales const price advert sq_advert restrict b =0 b =0 b =0 end restrict

scalar sseu = \$ess

scalar unrest_df = \$df

ols sales const

scalar sser = \$ess

scalar rest_df = \$df

scalar J = rest_df – unrest_df

scalar Fstat=((sser-sseu)/J)/(sseu/(unrest_df)) pvalue F J unrest_df Fstat

# t-test

# test of optimal advertising restrict b+3.8*b=1

end restrict

# One-sided t-test

scalar v = \$vcv[3,3]+((3.8)rt2)*\$vcv[4,4]+2*(3.8)*\$vcv[3,4]

scalar t = r/sqrt(v)

pvalue t \$df t

# joint test

b+3.8*b=1

b+6*b+1.9*b+3.61*b=80 end restrict

 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127

# restricted estimation

open "@gretldirdatapoebeer. gdt" logs q pb pl pr i

ols l_q const l_pb l_pl l_pr l_i —quiet restrict

b2+b3+b4+b5=0 end restrict restrict

b+b+b+b=0 end restrict

# model specification — relevant and irrelevant vars open "@gretldirdatapoeedu_inc. gdt"

ols faminc const he we omit we

corr

list all_x = const he we kl6 xtra_x5 xtra_x6 ols faminc all_x

# reset test

ols faminc const he we kl6 reset —quiet —squares-only reset –quiet

# model selection rules and a function function matrix modelsel (series y, list xvars)

ols y xvars —quiet scalar sse = \$ess scalar N = \$nobs scalar K = nelem(xvars) scalar aic = ln(sse/N)+2*K/N scalar bic = ln(sse/N)+K*ln(N)/N scalar rbar2 = 1-((1-\$rsq)*(N-1)/\$df) matrix A = { K, N, aic, bic, rbar2} printf "nRegressors: %sn",varname(xvars) printf "K = %d, N = %d, AIC = %.4f, SC = %.4f, and Adjusted R2 = %.4fn", K, N, aic, bic, rbar2 return A end function

list x1 = const he

list x2 = const he we

list x3 = const he we kl6

list x4 = const he we xtra_x5 xtra_x6

matrix a = modelsel(faminc, x1)

matrix b = modelsel(faminc, x2)

matrix c = modelsel(faminc, x3)

matrix d = modelsel(faminc, x4)

matrix MS = a|b|c|d colnames(MS,"K N AIC SC Adj_R2" ) printf "%10.5g",MS function modelsel clear

ols faminc all_x

omit xtra_x5 xtra_x6

omit kl6

omit we

modeltab show

ols faminc x3 —quiet reset

# collinearity

open "@gretldirdatapoecars. gdt"

summary

corr

ols mpg const cyl

ols mpg const cyl eng wgt –quiet

omit cyl

ols mpg const cyl eng wgt –quiet omit eng

ols mpg const cyl eng wgt –quiet omit eng cyl

# Auxiliary regressions for collinearity

# Check: r2 >.8 means severe collinearity ols cyl const eng wgt

scalar r1 = \$rsq ols eng const wgt cyl scalar r2 = \$rsq ols wgt const eng cyl scalar r3 = \$rsq

printf "R-squares for the auxillary regresionsnDependent Variable:

n cylinders %3.3gn engine displacement %3.3gn weight %3.3gn", r1, r2, r3

 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173

ols mpg const cyl eng wgt vif