Cars Example

The data set cars. gdt is included in package of datasets that are distributed with this manual. In most cases it is a good idea to print summary statistics of any new dataset that you work with. This serves several purposes. First, if there is some problem with the dataset, the summary statistics may give you some indication. Is the sample size as expected? Are the means, minimums and maximums reasonable? If not, you’ll need to do some investigative work. The other reason is important as well. By looking at the summary statistics you’ll gain an idea of how the variables have been scaled. This is vitally important when it comes to making economic sense out of the results. Do the magnitudes of the coefficients make sense? It also puts you on the lookout for discrete variables, which also require some care in interpreting.

The summary command is used to get summary statistics. These include mean, minimum, maximum, standard deviation, the coefficient of variation, skewness and excess kurtosis. The corr command computes the simple correlations among your variables. These can be helpful in gaining an initial understanding of whether variables are highly collinear or not. Other measures are more useful, but it never hurts to look at the correlations. Either of these commands can be used with a variable list afterwards to limit the list of variables summarized of correlated.

Consider the cars example from POE4. The script is

1 open "c:Program Filesgretldatapoecars. gdt"

2 summary

3 corr

4 ols mpg const cyl eng wgt

5 vif

The summary statistics appear below:

Summary Statistics, using the observations

1-392

Variable

Mean

Median

Minimum

Maximum

mpg

23.4459

22.7500

9.00000

46.6000

cyl

5.47194

4.00000

3.00000

8.00000

eng

194.412

151.000

68.0000

455.000

wgt

2977.58

2803.50

1613.00

5140.00

Variable

Std. Dev.

C. V.

Skewness

Ex. kurtosis

mpg

7.80501

0.332894

0.455341

-0.524703

cyl

1.70578

0.311733

0.506163

-1.39570

eng

104.644

0.538259

0.698981

-0.783692

wgt

849.403

0.285266

0.517595

-0.814241

and the correlation matrix

Correlation coefficients, using the observations 1-392
5% critical value (two-tailed) = 0.0991 for n = 392

mpg

cyl

eng

wgt

1.0000

-0.7776

-0.8051

-0.8322

mpg

1.0000

0.9508

0.8975

cyl

1.0000

0.9330

eng

1.0000

wgt

The variables are quite highly correlated in the sample. For instance the correlation between weight and engine displacement is 0.933. Cars with big engines are heavy. What a surprise!

The regression results are:

OLS, using observations 1-392
Dependent variable: mpg

Coefficient

Std. Error

t-ratio

p-value

const

44.3710

1.48069

29.9665

0.0000

cyl

-0.267797

0.413067

-0.6483

0.5172

eng

-0.0126740

0.00825007

-1.5362

0.1253

wgt

-0.00570788

0.000713919

-7.9951

0.0000

The test of the individual significance of cyl and eng can be read from the table of regression results. Neither are significant at the 5% level. The joint test of their significance is performed using the omit statement. The F-statistic is 4.298 and has a p-value of 0.0142. The null hypothesis is rejected in favor of their joint significance.

The new statement that requires explanation is vif. vif stands for variance inflation factor and it is used as a collinearity diagnostic by many programs, including gretl. The vif is closely related to the statistic suggested by Hill et al. (2011) who suggest using the R[27] from auxiliary regressions to determine the extent to which each explanatory variable can be explained as linear functions of the others. They suggest regressing xj on all of the other independent variables and comparing the R[28] from this auxiliary regression to 10. If the R2 exceeds 10, then there is evidence of a collinearity problem.

The vifj actually reports the same information, but in a less straightforward way. The vif associated with the jth regressor is computed

vifj = r-R2 ([29]*[30])

which is, as you can see, simply a function of the Rj2 from the jth regressor. Notice that when R2 > .80, the vifj > 10. Thus, the rule of thumb for the two rules is actually the same. A vifj greater than 10 is equivalent to an R2 greater than.8 from the auxiliary regression.

The output from gretl is shown below:

Variance Inflation Factors Minimum possible value = 1.0

Values > 10.0 may indicate a collinearity problem

Mean dependent var 23.44592

Sum squared resid 7162.549

R[31] 0.699293

F(3, 388) 300.7635

Log-likelihood -1125.674

Schwarz criterion 2275.234

S. E. of regression 4.296531

Adjusted R[32] 0.696967

P-value(F) 7.6e-101

Akaike criterion 2259.349

Hannan-Quinn 2265.644

Подпись: S.D. dependent var 7.805007

cyl 10.516 eng 15.786

VIF(j) = 1/(1 – R(j)"2), where R(j) is the multiple correlation coefficient between variable j and the other independent variables

Properties of matrix X’X:

1-norm = 4.0249836e+009

Determinant = 6.6348526e+018

Reciprocal condition number = 1.7766482e-009

Once again, the gretl output is very informative. It gives you the threshold for high collinearity (vifj) > 10) and the relationship between vifj and R2. Clearly, these data are highly collinear. Two variance inflation factors above the threshold and the one associated with wgt is fairly large as well.

The variance inflation factors can be produced from the dialogs as well. Estimate your model then, in the model window, select Tests>Collinearity and the results will appear in gretl’s output.

6.4 Script [33] 2 3 4 5 6 [34] [35] [36] [37] [38] 12 [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51]

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

square advert

ols sales const price advert sq_advert restrict b[2] =0 b[3] =0 b[4] =0 end restrict

ols sales const price advert sq_advert

scalar sseu = $ess

scalar unrest_df = $df

ols sales const

scalar sser = $ess

scalar rest_df = $df

scalar J = rest_df – unrest_df

scalar Fstat=((sser-sseu)/J)/(sseu/(unrest_df)) pvalue F J unrest_df Fstat

# t-test

ols sales const price advert sq_advert omit price

# optimal advertising

open "@gretldirdatapoeandy. gdt" square advert

ols sales const price advert sq_advert

scalar Ao =(1-$coeff(advert))/(2*$coeff(sq_advert))

# test of optimal advertising restrict b[3]+3.8*b[4]=1

end restrict

open "@gretldirdatapoeandy. gdt" square advert

ols sales const price advert sq_advert

scalar Ao =(1-$coeff(advert))/(2*$coeff(sq_advert))

# One-sided t-test

ols sales const price advert sq_advert –vcv

scalar r = $coeff(advert)+3.8*$coeff(sq_advert)-1

scalar v = $vcv[3,3]+((3.8)rt2)*$vcv[4,4]+2*(3.8)*$vcv[3,4]

scalar t = r/sqrt(v)

pvalue t $df t

# joint test

ols sales const price advert sq_advert restrict

b[3]+3.8*b[4]=1

b[1]+6*b[2]+1.9*b[3]+3.61*b[4]=80 end restrict

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

# restricted estimation

open "@gretldirdatapoebeer. gdt" logs q pb pl pr i

ols l_q const l_pb l_pl l_pr l_i —quiet restrict

b2+b3+b4+b5=0 end restrict restrict

b[2]+b[3]+b[4]+b[5]=0 end restrict

# model specification — relevant and irrelevant vars open "@gretldirdatapoeedu_inc. gdt"

ols faminc const he we omit we

corr

list all_x = const he we kl6 xtra_x5 xtra_x6 ols faminc all_x

# reset test

ols faminc const he we kl6 reset —quiet —squares-only reset –quiet

# model selection rules and a function function matrix modelsel (series y, list xvars)

ols y xvars —quiet scalar sse = $ess scalar N = $nobs scalar K = nelem(xvars) scalar aic = ln(sse/N)+2*K/N scalar bic = ln(sse/N)+K*ln(N)/N scalar rbar2 = 1-((1-$rsq)*(N-1)/$df) matrix A = { K, N, aic, bic, rbar2} printf "nRegressors: %sn",varname(xvars) printf "K = %d, N = %d, AIC = %.4f, SC = %.4f, and Adjusted R2 = %.4fn", K, N, aic, bic, rbar2 return A end function

list x1 = const he

list x2 = const he we

list x3 = const he we kl6

list x4 = const he we xtra_x5 xtra_x6

matrix a = modelsel(faminc, x1)

matrix b = modelsel(faminc, x2)

matrix c = modelsel(faminc, x3)

matrix d = modelsel(faminc, x4)

matrix MS = a|b|c|d colnames(MS,"K N AIC SC Adj_R2" ) printf "%10.5g",MS function modelsel clear

ols faminc all_x

modeltab add

omit xtra_x5 xtra_x6

modeltab add

omit kl6

modeltab add

omit we

modeltab add

modeltab show

ols faminc x3 —quiet reset

# collinearity

open "@gretldirdatapoecars. gdt"

summary

corr

ols mpg const cyl

ols mpg const cyl eng wgt –quiet

omit cyl

ols mpg const cyl eng wgt –quiet omit eng

ols mpg const cyl eng wgt –quiet omit eng cyl

# Auxiliary regressions for collinearity

# Check: r2 >.8 means severe collinearity ols cyl const eng wgt

scalar r1 = $rsq ols eng const wgt cyl scalar r2 = $rsq ols wgt const eng cyl scalar r3 = $rsq

printf "R-squares for the auxillary regresionsnDependent Variable:

n cylinders %3.3gn engine displacement %3.3gn weight %3.3gn", r1, r2, r3

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

ols mpg const cyl eng wgt vif

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>