Multiple Endogenous Regressors and the CraggDonald Ftest
3Cragg and Donald (1993) have proposed a test statistic that can be used to test for weak identification (i. e., weak instruments). In order to compute it manually, you have to obtain a set of canonical correlations. These are not computed in gretl so we will use another free software, R, to do part of the computations. On the other hand, gretl prints the value of the CraggDonald statistic by default so you won’t have to go to all of this trouble. Still, to illustrate a very powerful feature of gretl we will use R to compute part of this statistic.
One solution to identifying weak instruments in models with more than one endogenous regressor is based on the use of canonical correlations. Canonical correlations are a generalization of the usual concept of a correlation between two variables and attempt to describe the association between two sets of variables.
Let N denote the sample size, B the number of righthand side endogenous variables, G the number of exogenous variables included in the equation (including the intercept), L the number of external instrumentsi. e., ones not included in the regression. If we have two variables in the first set of variables and two variables in the second set then there are two canonical correlations, r and r2.
A test for weak identificationwhich means that the instruments are correlated with endogenous regressors, but not very highlyis based on the CraggDonald Ftest statistic
CraggDonald — F = [(N — G — B)/L] x [r2B/(1 — r2B)] (10.6)
The CraggDonald statistic reduces to the usual weak instruments Ftest when the number of endogenous variables is B = 1. Critical values for this test statistic have been tabulated by Stock and Yogo (2005), so that we can test the null hypothesis that the instruments are weak, against the alternative that they are not, for two particular consequences of weak instruments.
3The computations in this section use R. You should refer to D for some hints about using R.
The problem with weak instruments is summarized by Hill et al. (2011, p. 435):
Relative Bias: In the presence of weak instruments the amount of bias in the IV estimator can become large. Stock and Yogo consider the bias when estimating the coefficients of the endogenous variables. They examine the maximum IV estimator bias relative to the bias of the least squares estimator. Stock and Yogo give the illustration of estimating the return to education. If a researcher believes that the least squares estimator suffers a maximum bias of 10%, and if the relative bias is 0.1, then the maximum bias of the IV estimator is 1%.
Rejection Rate (Test Size): When estimating a model with endogenous regressors, testing hypotheses about the coefficients of the endogenous variables is frequently of interest. If we choose the a = 0.05 level of significance we expect that a true null hypothesis is rejected 5% of the time in repeated samples. If instruments are weak, then the actual rejection rate of the null hypothesis, also known as the test size, may be larger. Stock and Yogo’s second criterion is the maximum rejection rate of a true null hypothesis if we choose a = 0.05. For example, we may be willing to accept a maximum rejection rate of 10% for a test at the 5% level, but we may not be willing to accept a rejection rate of 20% for a 5% level test.
The script to compute the statistic manually is given below:
1 open "@gretldirdatapoemroz. gdt"
2 smpl wage>0 —restrict
3 logs wage
4 square exper
5 series nwifeinc = (famincwage*hours)/1000
6 list x = mtr educ kidsl6 nwifeinc const
7 list z = kidsl6 nwifeinc mothereduc fathereduc const
8 tsls hours x ; z
9 scalar df = $df
This first section loads includes much that we’ve seen before. The data are loaded, the sample restricted to the wage earners, the log of wage is taken, the square is experience is added to the data. Then a new variable is computed to measure family income from all other members of the household. The next part estimates a model of hours as a function of mtr, educ, kidsl6, nwifeinc, and a constant. Two of the regressors are endogenous: mtr and educ. The external instruments are mothereduc and fathereduc; these join the internal ones (const, kidsl6, nwifeinc) in the instrument list. The degrees of freedom from this regression is saved to compute (N — G — B)/L.
The next set of lines partial’s out the influence of the internal instruments on each of the endogenous regressors and on the external instruments.
10 list w = const kidsl6 nwifeinc
11 ols mtr w –quiet
series e1 = $uhat 

13 
ols educ w —quiet 

14 
series e2 = $uhat 

15 
ols mothereduc w — 
quiet 
16 
series e3 = $uhat 

17 
ols fathereduc w — 
quiet 
18 
series e4 = $uhat 
Now this is where it gets interesting. From here we are going to call a separate piece of software called R to do the computation of the canonical correlations. Lines 1925 hold what gretl refers to as a foreign block.
19 foreign language=R —senddata —quiet
20 setl < gretldata[,29:30]
21 set2 < gretldata[,31:32]
22 cc1 < cancor(set1,set2)
23 cc < as. matrix(cc1$cor)
24 gretl. export(cc)
25 end foreign
26
26 vars = mread("@dotdir/cc. mat")
27 print vars
28 scalar mincc = minc(vars)
29 scalar cd = df*(mincc"2)/(2*(1mincc"2))
30 printf "nThe CraggDonald Statistic is %10.4f.n",cd
A foreign block takes the form
————————— Foreign Block syntax
foreign language=R [–senddata] [–quiet]
… R commands… end foreign
and achieves the same effect as submitting the enclosed R commands via the GUI in the noninteractive mode (see section 30.3 of the Gretl Users Guide). In other words, it allows you to use R commands from within gretl. Of course, you have to have installed R separately, but this greatly expands what can be done using gretl. The –senddata option arranges for autoloading of the data from the current gretl session. The –quiet option prevents the output from R from being echoed in the gretl output. The block is closed with an end foreign command.
Inside our foreign block we create two sets of variables. The first set includes the residuals, e1 and e2 computed above. There are obtained from a matrix called gretldata. This is the name that gretl gives to data matrices that are passed into R. You have to pull the desired variables out of gretldata. In this case I used a rather inartful but effective means of doing so. These two variables are located in the 29th and 30th columns of gretldata. These also happen to be their ID numbers in gretl. Line 20 puts these two variables into setl.
The second set of residuals is put into set2. Then, R’s cancor function is used to find the canonical correlations between the two sets of residuals. The entire set of results is stored in R as cc. This object contains many results, but we only need the actual canonical correlations between the two sets. The canonical correlations are stored within cc as cor. They are retrieved as cc$cor and put into a matrix with R’s as. matrix command. These are exported to gretl as cc. mat. R adds the .mat suffix. cc. mat is placed in your working directory.
The next step is to read the cc. mat into gretl. Then in line we take the smallest canonical correlation and use it in line to compute the CraggDonald statistic. The result printed to the screen is:
? printf "nThe CraggDonald Statistic is %6.4f.n",cd The CraggDonald Statistic is 0.1006.
It matches the automatic one produced by tsls, which is shown below, perfectly! It also shows that these instruments are very weak.
Weak instrument test –
CraggDonald minimum eigenvalue = 0.100568
Critical values for desired TSLS maximal size, when running tests at a nominal 5% significance level:
size 10% 15% 20% 25%
value 7.03 4.58 3.95 3.63
Maximal size may exceed 25%
Of course, you can do this exercise without using R as well. Gretl’s matrix language is very powerful and you can easily get the canonical correlations from two sets of regressors. The following funcrion[74] does just that.
1 function matrix cc(list Y, list X)
2 matrix mY = cdemean({Y})
3 matrix mX = cdemean({X})
4
4 matrix YX = mY’mX
5 matrix XX = mX’mX
6 matrix YY = mY’mY
8
7 matrix ret = eigsolve(qform(YX, invpd(XX)), YY)
9 end function
The function is called cc and takes two arguments, just as the one in R. Feed the function two lists, each containing the variable names to be included in each set for which the canonical correlations are needed. Then, the variables in each set are demeaned using the very handy cdemean function. This function centers the columns of the matrix argument around the column means. Then the various crossproducts are taken (YX, XX, YY) and the eigenvalues for Q — AYY = 0, where Q = (YX)(U)1 (YX)T, are returned.
Then, to get the value of the CraggDonald F, assemble the two sets of residuals and use the cc function to get the canonical correlations.
1 list E1 = el e2
2 list E2 = e3 e4
3
3 l = cc(E1, E2)
4 scalar mincc = minc(l)
5 scalar cd = df*(mincc"2)/(2*(1mincc"2))
6 printf "nThe CraggDonald Statistic is %10.4f.n",cd
Leave a reply