Simulation

In appendix 10F of POE4, the authors conduct a Monte Carlo experiment comparing the performance of OLS and TSLS. The basic simulation is based on the model

y = x + e (10.7)

x = nz1 + nz2 + nz3 + v (10.8)

The Zi are exogenous instruments that are each N(0,1). The errors, e and v, are

The parameter n controls the strength of the instruments and is set to either 0.1 or 0.5. The parameter p controls the endogeneity of x. When p = 0, x is exogenous. When p = 0.8 it is seriously endogenous. Sample size is set to 100 and 10,000 simulated samples are drawn.

The gretl script to perform the simulation appears below:

1 scalar N = 100

2 nulldata N

3 scalar rho =0.8 # set r = (0.0 or 0.8)

4 scalar p = 0.5 # set p = (0.1 or 0.5)

5 matrix S = {1, rho; rho, 1}

6 matrix C = cholesky(S)

7

7 series z1 = normal(N,1)

8 series z2 = normal(N,1)

9 series z3 = normal(N,1)

10 series xs = p*z1 + p*z2 + p*z3

11 list z = z1 z2 z3

13

12 loop 10000 —progressive —quiet

13 matrix errors = mnormal(N,2)*C’

14 series v = errors[,1]

15 series e = errors[,2]

16 x = xs + v

17 y = x + e

18 ols x const z –quiet

19 scalar f = $Fstat

20 ols y 0 x –quiet

21 scalar b_ols = $coeff(x)

22 tsls y 0 x; 0 z –quiet

23 scalar b_tsls = $coeff(x)

24 store coef. gdt b_ols b_tsls f

25 print b_ols b_tsls f

26 endloop

The top part of the script initializes all of the parameters for the simulation. The sample size is set to 100, an empty dataset is created, the values of p and n are set, then the covariance matrix is created and the Cholesky decomposition is taken. The Cholesky decomposition is a trick used to create correlation among the residuals. There are more transparent ways to do this (e. g., e = rho*v + normal(0,1)), but this is a useful trick to use, especially when you want to correlate more than two series. The systematic part of x is created and called xs and a list to contain the instruments is created as well.

The loop uses the —progressive option and is set to do 10,000 iterations. The matrix called errors uses the Cholesky decomposition of the variance covariance to create the correlated errors. The first column we assign to v and the second to e. The endogenous regressor x is created by adding v to the systematic portion of the model, and then the dependent variable in the regression is created. The first regression in line 20 is the reduced form. The overall F statistic from this regression can serve as the test for weak instruments since there are no other exogenous variables in the model. The omit form of the F-test won’t work in a progressive loop so I avoided it here. The slope estimates for least squares and two-stage least squares are collected, stored to coef. gdt, and printed.

For this particular parameterization, I obtained the following result:

Statistics for 10000 repetitions

Variable mean std.

1.42382 0.0532148 1.00887 0.106816 30.6130 7.88943

With strong instruments, TSLS is basically unbiased. Least squares is seriously biased. Notice that the average value of the weak instrument test is 30.6, indicating the strong instruments. Try changing p and rho to replicate the findings in Table 10F.1 of POE4.

10.2 Script

1 set echo off

2 open "@gretldirdatapoemroz. gdt"

3 logs wage

4 square exper

5

5 list x = const educ exper sq_exper

6 list z = const exper sq_exper mothereduc

7 # least squares and IV estimation of wage eq

8 ols l_wage x

9 tsls l_wage x ; z

11

10 # tsls–manually

11 smpl wage>0 –restrict

12 ols educ z

13 series educ_hat = $yhat

14 ols l_wage const educ_hat exper sq_exper

17

15 # partial correlations–the FWL result

16 ols educ const exper sq_exper

17 series e1 = $uhat

18 ols mothereduc const exper sq_exper

19 series e2 = $uhat

20 ols e1 e2

21 corr e1 e2

25

22 list z = const exper sq_exper mothereduc

23 list z1 = const exper sq_exper fathereduc

24 list z2 = const exper sq_exper mothereduc fathereduc

29

25 # Hausman test with different sets of instruments

26 ols educ z –quiet

27 series ehat = $uhat

28 ols l_wage x ehat

34

29 ols educ z1 –quiet

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

series ehatl = $uhat ols l_wage x ehatl

ols educ z2 —quiet series ehat2 = $uhat ols l_wage x ehat2

# weak instruments

open "@gretldirdatapoemroz. gdt" smpl wage>0 —restrict logs wage square exper

list x = const educ exper sq_exper

list z2 = const exper sq_exper mothereduc fathereduc ols educ z2

omit mothereduc fathereduc

# Sargan test of overidentification tsls l_wage x; z2

series uhat2 = $uhat ols uhat2 z2 scalar test = $trsq pvalue X 2 test

tsls l_wage x ; z2

open "@gretldirdatapoemroz. gdt" smpl wage>0 —restrict logs wage square exper

list x = const educ exper sq_exper

list z2 = const exper sq_exper mothereduc fathereduc

tsls l_wage x; z2

series ehat2 = $uhat

ols ehat2 z2

scalar test = $trsq

pvalue X 2 test

# Cragg-Donald F

open "@gretldirdatapoemroz. gdt" smpl wage>0 —restrict logs wage square exper

series nwifeinc = (faminc-wage*hours)/l000

list x = mtr educ kidsl6 nwifeinc const

list z = kidsl6 nwifeinc mothereduc fathereduc const

tsls hours x ; z

scalar df = $df

list w = const kidsl6 nwifeinc ols mtr w –quiet series el = $uhat

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

ols educ w —quiet

series e2 = $uhat

ols mothereduc w —quiet

series e3 = $uhat

ols fathereduc w —quiet

series e4 = $uhat

# canonical correlations in R foreign language=R —send-data —quiet

set1 <- gretldata[,29:30] set2 <- gretldata[,31:32] cc1 <- cancor(set1,set2) cc <- as. matrix(cc1$cor) gretl. export(cc) end foreign

vars = mread("@dotdir/cc. mat") print vars

scalar mincc = minc(vars)

scalar cd = df*(minccrt2)/(2*(1-minccrt2))

printf "nThe Cragg-Donald Statistic is 0/06.4f.n",cd

# canonical correlations in gretl function matrix cc(list Y, list X)

matrix mY = cdemean({Y}) matrix mX = cdemean({X})

matrix YX = mY’mX matrix XX = mX’mX matrix YY = mY’mY

matrix ret = eigsolve(qform(YX, invpd(XX)), YY) return sqrt(ret) end function

list E1 = e1 e2 list E2 = e3 e4

l = cc(E1, E2) scalar mincc = minc(l)

scalar cd = df*(minccrt2)/(2*(1-minccrt2))

printf "nThe Cragg-Donald Statistic is %10.4f.n",cd

# simulation for ols and tsls scalar N = 100

nulldata N

scalar rho =0.8 # set r = (0.0 or 0.8)

scalar p = 0.5 # set p = (0.1 or 0.5)

matrix S = {1, rho; rho, 1}

matrix C = cholesky(S)

series z1 = normal(N,1) series z2 = normal(N,1) series z3 = normal(N,1) series xs = p*z1 + p*z2 + p*z3 list z = z1 z2 z3

loop 10000 —progressive —quiet

matrix errors = mnormal(N,2)*C’ series v = errors[,1] series e = errors[,2] x = xs + v y = x + e

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

ols x const z –quiet scalar f = $Fstat ols y 0 x –quiet scalar b_ols = $coeff(x) tsls y 0 x; 0 z –quiet scalar b_tsls = $coeff(x) store coef. gdt b_ols b_tsls f print b_ols b_tsls f endloop

Model 2: OLS, using observations 1-428 Dependent variable: educ

coefficient std. error t-ratio p-value

Comparison of Model 1 and Model 2:

Null hypothesis: the regression parameters are zero for the variables mothereduc, fathereduc

Test statistic: F(2, 423) ^^5.4003, with p-value = 4.2689ІЄ-022^

Of the 3 model selection statistics, 0 have improved.

Figure 10.3: Results from using the omit statement after least squares

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>