# Simulation

In appendix 10F of POE4, the authors conduct a Monte Carlo experiment comparing the performance of OLS and TSLS. The basic simulation is based on the model

y = x + e (10.7)

x = nz1 + nz2 + nz3 + v (10.8)

The Zi are exogenous instruments that are each N(0,1). The errors, e and v, are

The parameter n controls the strength of the instruments and is set to either 0.1 or 0.5. The parameter p controls the endogeneity of x. When p = 0, x is exogenous. When p = 0.8 it is seriously endogenous. Sample size is set to 100 and 10,000 simulated samples are drawn.

The gretl script to perform the simulation appears below:

1 scalar N = 100

2 nulldata N

3 scalar rho =0.8 # set r = (0.0 or 0.8)

4 scalar p = 0.5 # set p = (0.1 or 0.5)

5 matrix S = {1, rho; rho, 1}

6 matrix C = cholesky(S)

7

7 series z1 = normal(N,1)

8 series z2 = normal(N,1)

9 series z3 = normal(N,1)

10 series xs = p*z1 + p*z2 + p*z3

11 list z = z1 z2 z3

13

12 loop 10000 —progressive —quiet

13 matrix errors = mnormal(N,2)*C’

14 series v = errors[,1]

15 series e = errors[,2]

16 x = xs + v

17 y = x + e

18 ols x const z –quiet

19 scalar f = \$Fstat

20 ols y 0 x –quiet

21 scalar b_ols = \$coeff(x)

22 tsls y 0 x; 0 z –quiet

23 scalar b_tsls = \$coeff(x)

24 store coef. gdt b_ols b_tsls f

25 print b_ols b_tsls f

26 endloop

The top part of the script initializes all of the parameters for the simulation. The sample size is set to 100, an empty dataset is created, the values of p and n are set, then the covariance matrix is created and the Cholesky decomposition is taken. The Cholesky decomposition is a trick used to create correlation among the residuals. There are more transparent ways to do this (e. g., e = rho*v + normal(0,1)), but this is a useful trick to use, especially when you want to correlate more than two series. The systematic part of x is created and called xs and a list to contain the instruments is created as well.

The loop uses the —progressive option and is set to do 10,000 iterations. The matrix called errors uses the Cholesky decomposition of the variance covariance to create the correlated errors. The first column we assign to v and the second to e. The endogenous regressor x is created by adding v to the systematic portion of the model, and then the dependent variable in the regression is created. The first regression in line 20 is the reduced form. The overall F statistic from this regression can serve as the test for weak instruments since there are no other exogenous variables in the model. The omit form of the F-test won’t work in a progressive loop so I avoided it here. The slope estimates for least squares and two-stage least squares are collected, stored to coef. gdt, and printed.

For this particular parameterization, I obtained the following result:

Statistics for 10000 repetitions

Variable mean std.

1.42382 0.0532148 1.00887 0.106816 30.6130 7.88943

10.2 Script

1 set echo off

2 open "@gretldirdatapoemroz. gdt"

3 logs wage

4 square exper

5

5 list x = const educ exper sq_exper

6 list z = const exper sq_exper mothereduc

7 # least squares and IV estimation of wage eq

8 ols l_wage x

9 tsls l_wage x ; z

11

10 # tsls–manually

11 smpl wage>0 –restrict

12 ols educ z

13 series educ_hat = \$yhat

14 ols l_wage const educ_hat exper sq_exper

17

15 # partial correlations–the FWL result

16 ols educ const exper sq_exper

17 series e1 = \$uhat

18 ols mothereduc const exper sq_exper

19 series e2 = \$uhat

20 ols e1 e2

21 corr e1 e2

25

22 list z = const exper sq_exper mothereduc

23 list z1 = const exper sq_exper fathereduc

24 list z2 = const exper sq_exper mothereduc fathereduc

29

25 # Hausman test with different sets of instruments

26 ols educ z –quiet

27 series ehat = \$uhat

28 ols l_wage x ehat

34

29 ols educ z1 –quiet

 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

series ehatl = \$uhat ols l_wage x ehatl

ols educ z2 —quiet series ehat2 = \$uhat ols l_wage x ehat2

# weak instruments

open "@gretldirdatapoemroz. gdt" smpl wage>0 —restrict logs wage square exper

list x = const educ exper sq_exper

list z2 = const exper sq_exper mothereduc fathereduc ols educ z2

omit mothereduc fathereduc

# Sargan test of overidentification tsls l_wage x; z2

series uhat2 = \$uhat ols uhat2 z2 scalar test = \$trsq pvalue X 2 test

tsls l_wage x ; z2

open "@gretldirdatapoemroz. gdt" smpl wage>0 —restrict logs wage square exper

list x = const educ exper sq_exper

list z2 = const exper sq_exper mothereduc fathereduc

tsls l_wage x; z2

series ehat2 = \$uhat

ols ehat2 z2

scalar test = \$trsq

pvalue X 2 test

# Cragg-Donald F

open "@gretldirdatapoemroz. gdt" smpl wage>0 —restrict logs wage square exper

series nwifeinc = (faminc-wage*hours)/l000

list x = mtr educ kidsl6 nwifeinc const

list z = kidsl6 nwifeinc mothereduc fathereduc const

tsls hours x ; z

scalar df = \$df

list w = const kidsl6 nwifeinc ols mtr w –quiet series el = \$uhat

 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137

ols educ w —quiet

series e2 = \$uhat

ols mothereduc w —quiet

series e3 = \$uhat

ols fathereduc w —quiet

series e4 = \$uhat

# canonical correlations in R foreign language=R —send-data —quiet

set1 <- gretldata[,29:30] set2 <- gretldata[,31:32] cc1 <- cancor(set1,set2) cc <- as. matrix(cc1\$cor) gretl. export(cc) end foreign

vars = mread("@dotdir/cc. mat") print vars

scalar mincc = minc(vars)

scalar cd = df*(minccrt2)/(2*(1-minccrt2))

printf "nThe Cragg-Donald Statistic is 0/06.4f.n",cd

# canonical correlations in gretl function matrix cc(list Y, list X)

matrix mY = cdemean({Y}) matrix mX = cdemean({X})

matrix YX = mY’mX matrix XX = mX’mX matrix YY = mY’mY

matrix ret = eigsolve(qform(YX, invpd(XX)), YY) return sqrt(ret) end function

list E1 = e1 e2 list E2 = e3 e4

l = cc(E1, E2) scalar mincc = minc(l)

scalar cd = df*(minccrt2)/(2*(1-minccrt2))

printf "nThe Cragg-Donald Statistic is %10.4f.n",cd

# simulation for ols and tsls scalar N = 100

nulldata N

scalar rho =0.8 # set r = (0.0 or 0.8)

scalar p = 0.5 # set p = (0.1 or 0.5)

matrix S = {1, rho; rho, 1}

matrix C = cholesky(S)

series z1 = normal(N,1) series z2 = normal(N,1) series z3 = normal(N,1) series xs = p*z1 + p*z2 + p*z3 list z = z1 z2 z3

loop 10000 —progressive —quiet

matrix errors = mnormal(N,2)*C’ series v = errors[,1] series e = errors[,2] x = xs + v y = x + e

 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158

ols x const z –quiet scalar f = \$Fstat ols y 0 x –quiet scalar b_ols = \$coeff(x) tsls y 0 x; 0 z –quiet scalar b_tsls = \$coeff(x) store coef. gdt b_ols b_tsls f print b_ols b_tsls f endloop

Model 2: OLS, using observations 1-428 Dependent variable: educ

coefficient std. error t-ratio p-value

Comparison of Model 1 and Model 2:

Null hypothesis: the regression parameters are zero for the variables mothereduc, fathereduc

Test statistic: F(2, 423) ^^5.4003, with p-value = 4.2689ІЄ-022^

Of the 3 model selection statistics, 0 have improved.

Figure 10.3: Results from using the omit statement after least squares