The final test is the Sargan test of the overidentifying restrictions implied by an overidentified model. Recall that to be overidentified just means that you have more instruments than you have endogenous regressors. In our example we have a single endogenous regressor (educ) and two instruments, (mothereduc and fatehreduc). The first step is to estimate the model using TSLS using all the instruments. Save the residuals and then regress these on the instruments alone. TR2 from this regression is approximately x2 with the number of surplus instruments as your degrees of freedom. Gretl does this easily since it saves TR2 as a part of the usual regression output, where T is the sample size (which we are calling N in cross-sectional examples). The script for the Sargan test follows:
1 open "@gretldirdatapoemroz. gdt"
2 smpl wage>0 —restrict
3 logs wage
4 square exper
5 list x = const educ exper sq_exper
6 list z2 = const exper sq_exper mothereduc fathereduc
7 tsls l_wage x; z2
8 series ehat2 = $uhat
9 ols ehat2 z2
10 scalar test = $trsq
11 pvalue X 2 test
The first 6 lines open the data, restricts the sample, generates logs and squares, and creates the lists of regressors and instruments. In line 7 the model is estimated using TSLS with the variables in list x as regressors and those in z2 as instruments. In line 8 the residuals are saved as ehat2. Then in line 9 a regression is estimated by ordinary least squares using the residuals and instruments as regressors. TR2 is collected and the p-value computed in the last line.
The result is:
Generated scalar test = 0.378071
Chi-square(2): area to the right of 0.378071 = 0.827757 (to the left: 0.172243)
The p-value is large and the null hypothesis that the overidentifying restrictions are valid cannot be rejected. The instruments are determined to be ok. Rejection of the null hypothesis can mean that the instruments are either correlated with the errors or that they are omitted variables in the model. In either case, the model as estimated is misspecified.
Finally, gretl produces these tests whenever you estimate a model using tsls. If the model is exactly identified, then the Sargan test results are omitted. Here is what the output looks like in the wage example:
Hausman test –
Null hypothesis: OLS estimates are consistent Asymptotic test statistic: x2(1) = 2.8256 with p-value = 0.0927721
Sargan over-identification test – Null hypothesis: all instruments are valid Test statistic: LM = 0.378071 with p-value = P(x2(1) > 0.378071) = 0.538637
Weak instrument test –
First-stage F(2, 423) = 55.4003
Critical values for desired TSLS maximal size, when running tests at a nominal 5% significance level:
size 10% 15% 20% 25%
value 19.93 11.59 8.75 7.25
Maximal size is probably less than 10%
You can see that the Hausman test statistic differs from the one we computed manually using the script. However, the p-value associated with this version and ours above are virtually the same. The results from the instrument strength test and from the Sargan test for overdentification are the same. In conclusion, there is no need to compute any of these tests manually, unless you want to.
Finally, you will also see that some additional information is being printed at the bottom of the test for weak instruments. The rule-of-thumb we have suggested is that if the F > 10 then instruments are relatively strong. This begs the question, why not use the usual 5% critical value from the F-distribution to conduct the test? The answer is that instrumental variables estimators (though consistent) are biased in small samples. The weaker the instruments, the greater the bias. In fact, the bias is inversely related to the value of the F-statistic. An F = 10 is roughly equivalent to 1/F = 10% bias in many cases. The other problem caused by weak instruments is that they affect the asymptotic distribution of the usual t – and F-statistics. This table is generated to give you a more specific idea of what the actual size of the weak instruments test is. For instance, if you are willing to reject weak instruments 10% of the time, then use a critical value of 19.93. The rule-of-thumb value of 10 would lead to actual rejection of weak instruments somewhere between 15% and 20% of the time. Since our F = 55.4 > 19.93 we conclude that our test has a size less than 10%. If so, you would expect the resulting TSLS estimator based on these very strong instruments to exhibit relatively small bias.