Heteroskedasticity in the Linear Probabilty Model

In chapter 7 we introduced the linear probability model. It was shown that the indicator variable, yi is heteroskedastic. That is,

var(yi) = пі(1 – пі) (8.10)

where ni is the probability that the dependent variable is equal to 1 (the choice is made). The estimated variance is

var(yi) = Пі(1 – Пі) (8.11)

This can be used to perform feasible GLS. The cola marketing data coke. gdt is the basis for this example. The independent variable, coke, takes the value of 1 if the individual purchases Coca-Cola and is 0 if not. The decision to purchase Coca-Cola depends on the ratio of the price relative to Pepsi, and whether displays for Coca-Cola or Pepsi were present. The variables disp_coke=1 if a Coca-Cola display was present, otherwise 0; disp_pepsi=1 if a Pepsi display was present, otherwise it is zero.

1 First, the data are loaded and the summary statistics are provided.

2 open "@gretldirdatapoecoke. gdt"

3 summary —simple

4 list x = const pratio disp_coke disp_pepsi

The —simple option is used for the summary command. Then a list is created that contains the names of the independent variables to be used in the estimated models. The basic summary statistics are:

Summary statistics, using the observations 1 – 1140

Mean Minimum Maximum Std. Dev.

coke

0.44737

0.00000

1.0000

0.49744

pr_pepsi

1.2027

0.68000

1.7900

0.30073

pr_coke

1.1901

0.68000

1.7900

0.29992

disp_pepsi

0.36404

0.00000

1.0000

0.48137

disp_coke

0.37895

0.00000

1.0000

0.48534

pratio

1.0272

0.49721

2.3247

0.28661

Everything looks good. There are no negative prices, and the indicator variables are all contained between 0 and 1. The magnitudes of the means are reasonable.

Next, least squares is used to estimate the model twice: once with usual standard errors and again with the HCCME standard errors produced by the —robust option. Each is added to a model table using modeltab add.

1 # OLS

2 ols coke x

3 modeltab add

4 # OLS w/robust

5 ols coke x —robust

6 modeltab add

Feasible GLS will be estimated in two ways. In the first regression, we will omit any observation that has a negative estimated variance. Remember that one of the problems with linear probability is that predictions are not constrained to lie between 0 and 1. If yi < 0 or yi > 1, then variance estimates will be negative. In the first line below a new series is created to check this condition. If the variance, varp, is greater than zero, pos will be equal to 1 and if not, then it is zero. The second line creates a weight for wls that is formed by multiplying the indicator variable pos times the reciprocal of the variance. In this way, any nonnegative weights become zeros.

—————- Remove observations with negative variance

1 series p = $yhat

2 series varp = p*(1-p)

3 series pos = (varp > 0)

4 series w = pos * 1/varp

5 # omit regression

6 wls w coke x

7 modeltab add

The first line uses the accessor for the predicted values from a linear regression, $yhat, and therefore it must follow least squares estimation of the linear probability model; in this model, they are interpreted as probabilities. Once again, a trick is being used to eliminate observations from the model. Basically, any observation that has a zero weight in w is dropped from the computation. There are equivalent ways to do this in gretl as shown below

—————— Two other ways to drop observations

smpl varp>0 —restrict

setmiss 0 w

The restricting the sample is probably the most straightforward method. The second uses the setmiss command that changes the missing value code to 0 for elements of w; any observation where w=0 is now considered missing and won’t be used to estimate the model.

Finally, another feasible GLS estimation is done. This time, p1 is truncated at 0.01 if yi < 0.01 and to 0.99 if yi > 0.99. The code to do this is

——– WLS with truncated variances for observations out of bounds

1 series b = (p<.01) || (p>.99)

2 series pt = b*0.01 + p*(1-b)

3 series varp_t = pt*(1-pt)

4 series w_t = 1/varp_t

5 wls w_t coke x

6 modeltab add

7 modeltab show

The first line creates another indicator variable that takes the value of 1 if the predicted probability falls outside of the boundary. The || is a logical operator that takes the union of the two condi­tions (=“OR”). The second line creates the truncated value of the probability using the indicator variable.

Подпись: (8.12)b(0.01) + p(1 – b) = 0.01 when b = 1 b(0.01) + p(1 — b) = p when b = 0

There is another, less transparent, way to generate the truncated probabilities: use the ternary conditional assignment operator. This operates like an if statement and can be used to save a line of script. This syntax would create the series as

——————– The conditional assignment operator

series pt = ( (p<.01) || (p>.99) ) ? 0.01 : p

Basically the bound condition in parentheses (p < .01)||(p > .99) is checked: that is what the question mark represents. If it is true, pt is set to the first value that appears in front of the colon. If false, it is set to the value specified to the right of the colon. It operates very much like a traditional if statement in a spreadsheet program. This method is more efficient computationally as well, which could save some time if used in a loop to perform simulations.

Once the truncated probabilities are created, then the usual weighted least squares estimation can proceed. The model table appears below:

Подпись: (1) OLS 0. 8902** (0.06548) -0.4009** (0.06135) 0.07717** (0.03439) -0.1657** (0.03560) 1140 0.1177 -748.1 Подпись: (2) OLS 0.8902** (0.06563) -0.4009** (0.06073) 0.07717** (0.03402) -0.1657** (0.03447) 1140 0.1177 -748.1 Подпись: (3) WLS 0.8795** (0.05897) -0.3859** (0.05233) 0.07599** (0.03506) -0.1587** (0.03578) 1124 0.2073 -1617 Подпись: (4) WLS 0.6505** (0.05685) -0.1652** (0.04437) 0.09399** (0.03987) -0.1314** (0.03540) 1140 0.0865 -1858
Подпись: const pratio disp_coke disp_pepsi n R2 £

Dependent variable: coke

Standard errors in parentheses * indicates significance at the 10 percent level ** indicates significance at the 5 percent level

Columns (1) and (2) are the OLS estimates with usual and robust standard errors, respectively. Column (3) uses WLS with the negative variance observations omitted from the sample. Column (4) is WLS with the negative predictions truncated. These results are quite a bit different from the others. This no doubt occurs because of the large weight being placed on the 16 observations whose weights were constructed by truncation. The var(ei) = 0.01(1 — 0.01) = 0.0099. The square root of the reciprocal is approximately 10, a large weight to be placed on these 16 observations via WLS. Since these extreme observations carry a large weight relative to the others, they exert a considerable influence on the estimated regression.

8.4 Script

1 open "@gretldirdatapoefood. gdt"

2 set echo off

3 ols food_exp const income

4 gnuplot food_exp income —linear-fit

5 # see section 1.4 of this manual for commands to view these plots.

6

6 # ols with HCCME standard errors

7 ols food_exp const income —robust

8 # confidence intervals (Robust)

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

scalar lb = $coeff(income) – critical(t,$df,0.025) * $stderr(income) scalar ub = $coeff(income) + critical(t,$df,0.025) * $stderr(income) printf "nThe 95%% confidence interval is (%.3f, %.3f).n",lb, ub

# residual plot

ols food_exp const income –robust series res = $uhat

setinfo res – d "Least Squares Residuals" – n "Residual" gnuplot res income —output=c:Tempolsres

# lauch gnuplot (Windows only) launch wgnuplot

# To view graph, type: load ‘C:Tempolsres’ at prompt

# residual magnitude plot with loess fit series abs_e = abs(res)

setinfo abs_e – d "Absolute value of the LS

Residuals" – n "Absolute Value of Residual"

gnuplot abs_e income —loess-fit —output=c:temploessfit. plt

# LM test for heteroskdasticity ols food_exp const income series sq_ehat = $uhat*$uhat ols sq_ehat const income scalar NR2 = $trsq

pvalue X 1 NR2

# built-in LM test

ols food_exp const income modtest income —breusch-pagan

# White test

ols food_exp const income series sq_ehat = $uhat*$uhat series sq_income = incomert2 ols sq_ehat const income sq_income scalar NR2 = $trsq pvalue X 2 NR2

# built-in White test

ols food_exp const income —quiet modtest —white —quiet

# grouped data—Goldfeld-Quandt open "@gretldirdatapoecps2.gdt" ols wage const educ exper metro

# Use only metro observations smpl metro=1 –restrict

ols wage const educ exper scalar stdm = $sigma scalar df_m = $df

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

#Restore the full sample smpl full

# Use only rural observations smpl metro=0 —restrict

ols wage const educ exper scalar stdr = $sigma scalar df_r = $df

# GQ statistic

gq = stdmrt2/stdrrt2

scalar pv = pvalue(F, df_m, df_r, gq)

printf "nThe F(%d, %d) statistic = %.3f. The right side p-value is %.4g.n",df_m, df_r, gq, pv

# Goldfeld-Quandt for food expenditure open "@gretldirdatapoefood. gdt" dataset sortby income

list x = const income ols food_exp x

# large variance observations smpl 21 40

ols food_exp x scalar stdL = $sigma scalar df_L = $df #Restore the full sample smpl full

# small variance observations smpl 1 20

ols food_exp x scalar stdS = $sigma scalar df_S = $df

# GQ statistic

gq = stdLrt2/stdSrt2

scalar pv = pvalue(F, df_L, df_S, gq) printf "nThe F(%d, %d) statistic = %.3f. The right side p-value is %.4g.n",df_L, df_S, gq, pv

# compare ols with and without HCCME list x = const income

ols food_exp x –quiet modeltab add

ols food_exp x –robust –quiet modeltab add modeltab show

# hypothesis test

ols food_exp x –robust omit income

ols food_exp x –quiet restrict b[2]=0 end restrict

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

ols food_exp x —robust —quiet restrict b[2]=0 end restrict

open "@gretldirdatapoefood. gdt"

#GLS using built in function

series w = 1/income

wls w food_exp const income

scalar lb = $coeff(income) – critical(t,$df,0.025) * $stderr(income)

scalar ub = $coeff(income) + critical(t,$df,0.025) * $stderr(income)

printf "nThe 95%% confidence interval is (%.3f, %.3f).n",lb, ub

#GLS using OLS on transformed data

series wi = 1/sqrt(income)

series ys = wi*food_exp

series xs = wi*income

series cs = wi

ols ys cs xs

#Wage Example

open "@gretldirdatapoecps2.gdt" ols wage const educ exper metro

# Use only metro observations smpl metro –dummy

ols wage const educ exper scalar stdm = $sigma smpl full

#Create a dummy variable for rural

series rural = 1-metro

#Restrict sample to rural observations

smpl rural –dummy

ols wage const educ exper

scalar stdr = $sigma

#Restore the full sample

smpl full

#Generate standard deviations for each metro and rural obs

series wm = metro*stdm

series wr = rural*stdr

series w = 1/(wm + wr)rt2

#Weighted least squares

wls w wage const educ exper metro

# heteroskedastic model

open "@gretldirdatapoefood. gdt" ols food_exp const income series lnsighat = log($uhat*$uhat) series z = log(income) ols lnsighat const z
series predsighat = exp($yhat) series w = 1/predsighat wls w food_exp const income

# linear probability model

open "@gretldirdatapoecoke. gdt" summary –simple

list x = const pratio disp_coke disp_pepsi

# OLS

ols coke x modeltab add

# OLS w/robust

ols coke x –robust modeltab add series p = $yhat series varp = p*(1-p) series pos = (varp > 0) series w = pos * 1/varp

# omit regression wls w coke x modeltab add

# smpl varp>0 –restrict

# setmiss 0 w

series b = (p<.01) || (p>.99) series pt = b*0.01 + p*(1-b) series varp_t = pt*(1-pt) series w_t = 1/varp_t

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

# trunc regression wls w_t coke x modeltab add modeltab show

image273

Figure 8.2: Check the box for heteroskedasticity robust standard errors.

 

image274

Figure 8.3: Set the method for computing robust standard errors. These are located under the HCCME tab. From the pull-down list for cross-sectional data choose an appropriate option-HC3 in this case.

 

image275

Figure 8.4: Plot of food expenditures against income with least squares fit.

image276

Figure 8.5: Plot of the absolute value of the food expenditures model residuals against income with loess fit.

 

image277

Figure 8.6: Select Data>Sort data from the main menu bar to reveal the dialog box shown on the right side of of this figure. Choose the desired sort key and indicate whether you want to sort in ascending or descending order.

 

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>